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This  thesis  presents  a  general  computational  method,  amenable  for 
an  efficient  implementation  in  digital  computing  systems.   The  method 
provides  a  unique,  simple  and  fast  algorithm  for  solving  many  computational 
problems,  such  as  the  evaluation  of  polynomials,  rational  functions  and 
arithmetic  expressions,  or  solving  a  class  of  systems  of  linear  equations, 
or  performing  the  basic  arithmetics.   In  particular,  the  method  is  well- 
suited  for  fast  evaluation  of  commonly  used  mathematical  functions.   The 
method  consists  of  i)  a  correspondence  rule  which  reduces  a  given 
computational  problem  f  into  a  system  of  linear  equations  L  and  ii)  an 
algorithm  which  generates  the  solution  to  the  system  L  and,  hence,  to  the 
problem  f,  in  0(m)  addition  steps  with  an  m-digit  precision.   However, 
the  time  to  execute  each  addition  step  is  independent  of  the  operand 
precision.   The  algorithm  is  deterministic  and  always  generates  the  result 
in  a  digit -by-digit  fashion,  the  most  significant  digit  appearing  first. 
Therefore,  the  algorithm  provides  for  an  overlap  in  a  sequence  of 
computations  as  well  as  for  a  variable  precision  operation.   The  method, 
in  general,  has  favorable  error  properties  and  simple  implementation 
requirements.   It  is  believed  that  the  proposed  method  represents  not  only 
a  practical  and  widely  applicable  computational  approach  but  also  an 
effective  technique  for  reducing  the  computational  complexity  and 
increasing  the  speed  of  numerical  algorithms  in  general. 
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1.   INTRODUCTION 

1.1  Motivations,  Objectives  and  Related  Work 

The  subject  of  this  dissertation  is  a  novel  computational  method, 
motivated  by  a  desire  to  demonstrate  an  approach  in  designing  fast 
algorithms  for  numerical  computations  as  a  viable  alternative  to  the 
commonly  known  parallel  algorithms,  requiring  multiprocessor  systems  on 
one  side,  and  to  the  strictly  hardware -oriented  algorithms  on  the  other 
side. 

Fast  algorithms  are  of  basic  interest  in  the  theory  of  computation 
and  of  an  increasingly  practical  importance  in  the  organization  and 
design  of  computing  systems.  With  respect  to  implementation,  it  is 
convenient  to  classify  here  fast  methods  as  i)  those  which  use  a 
multiplicity  of  general-purpose  processors  with  the  corresponding 
algorithms  specified  on  the  software  (instruction)  level,  and  ii)  those 
which  require  special-purpose  processors  with  algorithms  embedded  in  the 
hardware  (operation)  level.  By  definition,  the  first  class  carries 
with  it  a  notion  of  generality  while  the  second  class  implies  "limitations 
of  the  application  domain.  Due  to  implementation  properties,  the  former 
class  cannot  achieve  the  speed  and  the  efficiency  of  the  latter  with 
respect  to  a  given  algorithm. 

The  evident  progress  of  hardware  technology  does  enhance  the 
performance  of  the  methods  in  both  classes  but  the  problem  of  algorithm 


design,  which  is  optimal  with  respect  to  the  available  technology,  becomes, 
in  some  sense,  even  more  challenging.   That  is,  as  the  cost  and  attainable 
complexity  of  hardware  change,  what  are  the  optimal  algorithms  and  primitive 
operators?  It  is,  perhaps,  a  convenience,  from  a  human  point  of  view,  to 
insist  on  implementing  all  and  only  four  basic  arithmetics  as  the  primitive 
operators,  but  that  hardly  proves  the  convenience  with  respect  to  the 
implementation  environment. 

When  considering  a  computational  method  for  possible  implementation, 
one  is  usually  concerned  with  i)  its  application  domain,  ii)  the  required 
set  of  algorithms,  and  iii)  the  required  set  of  primitive  operators. 
With  these,  one  also  associates  a  set  of  desired  or  required  properties, 
for  instance,  the  speed,  the  complexity  and  the  cost  of  implementation, 
numerical  characteristics  of  the  algorithms,  etc.   Ideally,  an  implemented 
computational  method  should  have  as  large  a  domain  of  application  as 
possible,  a  single  but  simple  algorithm  and  only  those  primitive  operators 
which  are  efficiently  realizable  in  the  given  implementation  technology. 

The  objective  here  is  to  define  a  method  which  would  have  a 
sufficient  generality  in  applications  and  such  functional  properties 
in  order  to  justify,  without  difficulties,  a  hardware -level  implementation. 
In  other  words,  the  intention  is  to  combine  the  favorable  properties  of 
the  two  previously  mentioned  classes  of  computational  methods. 

One  of  the  original  motivations  was  the  problem  of  fast  evaluation 
of  commonly  used  mathematical  functions.   The  proposed  method  evolved 
while  attempting  to  solve  this  problem  in  a  new  way.   For  that  reason, 
we  will  restrict  our  attention  in  the  remainder  of  the  present  chapter 
to  some  of  the  known  practical  methods  of  evaluation  of  functions.  One 


class  of  these  methods  is  based  on  the  classical  approximation  techniques, 
in  particular,  the  minimax  or  near-minimax  polynomial  or  rational 
approximations  [HAR68].   These  methods,  for  all  practical  purposes,  can 
be  considered  general.   The  other  class  contains  those  methods  which  are 
based  on  certain  specific  properties  of  the  functions  being  evaluated:  they 
are  devised  with  implementation  efficiency  and  speed  as  objectives  but 

they  have  a  limited  domain  of  application  by  definition  [FRA73].   For  the 
former  class,  since  the  approximation  problem  can  be  assumed  to  be  solved, 
one  is  concerned  only  with  the  problem  of  efficient  evaluation  of  the 
corresponding  approximating  functions.   For  polynomials,  a  direct 
hardware-implemented  evaluation  scheme,  based  on  Horner's  or  Clenshaw's 
recurrences,  suitable  for  the  summation  of  the  Chebyshev  series  [CLE62] 
in  time  0(n)  multiplications,  gives  only  a  slight  improvement  in  performance 
over  the  software  versions  so  that  its  implementation,  except  in 
microprogrammed  machines,  is  hardly  justified.   The  fast  polynomial 
evaluation  schemes,  on  the  other  hand,  provide  a  significant  gain  in  speed 
at  the  expense  of  a  considerable  hardware  complexity.   Tung  [TUN68] 
considered  the  implementation  of  an  efficient  polynomial  evaluator,  based 
on  the  fast  primitive  operators  and  a  redundant  number  representation.   The 
proposed  structure  is  versatile  but  complicated  on  the  level  of  the  basic 
building  block  as  well  as  in  the  overall  control  and  intercommunication 
requirements.   For  the  rational  functions,  fast  parallel  schemes  offer 
even  less  efficiency  yet  the  rational  approximations  would  be  preferable 
in  many  instances.  The  method,  proposed  here,  has  the  same  generality 
in  such  an  application  but  offers  a  better  performance.   Its  algorithm  is 

simple  and  requires  two  (three)  operand  additions  as  the  primitive 


operator  for  the  evaluation  of  polynomial  (rational)  functions.  A 
corresponding  implementation  can  be  simple,  yet  the  time  required  for 
evaluation  is  of  the  order  of  one  carry-save  type  multiplication  time,  if  the 
coefficients  of  the  approximating  functions  satisfy  certain  range  conditions. 
Otherwise,  the  required  scaling  will  cause  a  logarithmic  extension  in  the 
working  precision.   In  addition  to  a  simple  basic  computing  block,  the 
interconnection  requirements  of  this  method  are  much  simpler  than  those 
of  the  above  mentioned  methods. 

The  second  class  of  methods  being  considered,  as  mentioned  earlier, 
includes  those  methods  which  utilize  certain  functional  properties  to 
achieve  an  efficient  and  fast  implementation.  Although  these  methods 
appear  limited  in  application,  some  of  them  can  generate  enough  different 
functions  in  order  to  justify  a  hardware  implementation.   Their  efficiency 
comes  from  simple  computational  algorithms  and  primitive  operators,  like 
add  and  shift,  which  can  be  conveniently  implemented.   The  proposed 
method  appears  even  better  in  this  respect:   its  computational  algorithm 
is  simpler  and  problem  invariant.   There  is  no  shift  operator,  which  in 
many  cases  must  have  a  variable  shifting  capability.  When  a  redundant 
representation  is  introduced  in  order  to  make  the  basic  computation  step 
independent  in  time  of  the  length  of  the  operands,  a  variable  shift 
operator  can  considerably  affect  the  complexity  of  implementation.   Some 
of  the  most  important  methods  in  this  class  are:  Voider 's  coordinate 
rotation  technique  (CORDIC),  described  in  [VOL59]  and  later  generalized, 
in  the  form  of  a  unified  algorithm  in  [WAL71] ;  the  normalization  methods, 
based  on  an  iterative  co- transformation  of  a  number  pair  (x,y)  such  that 
a  function  f(x,y)  remains  invariant  as  proposed  in  [SPE65,  DEL70,  CHE72]; 


and  the  pseudo- division  and  the  pseudo-multiplication  methods  for  some 
elementary  functions  as  described  in  [MEG62].  A  combination  of  some  of 
these  methods  provides  for  fast  evaluation  of  the  most  often  used  elementary 
functions  (for  instance  square  roots,  logarithms,  exponentials,  trigono- 
metric, hyperbolic  and  their  inverse  functions).   The  method  proposed  here 
has  even  more  versatility  in  this  respect  and  it  is  generally  comparable  in 
speed.  Although  some  of  the  methods  mentioned  above  may  use  fewer 
implementation  resources,  the  proposed  method  excels  in  simplicity  with 
respect  to  the  basic  computing  block  and  overall  structure.  Moreover, 
the  proposed  method  can  be  applied  in  solving  problems  other  than  function 
evaluation.   Certain  arithmetic  expressions,  multiple  products  and  sums, 
inner  products,  integral  powers  and  solving  of  systems  of  linear  equations 
under  certain  conditions,  are  among  the  possible  applications.  Basic 
arithmetics,  in  particular,  multiplication  and  division  are  easily 
performed  by  this  method.   Furthermore,  it  has  useful  functional  properties: 
its  computational  algorithm  is  problem-  and  step- in variant;  the  results 
are  generated  in  a  digit-by-digit  fashion  with  the  most  significant  digits 
appearing  first  so  that  an  overlap  of  computations  can  be  utilized.   These 
properties  make  the  method  suitable  for  a  variable  precision  mode  of 
operation.   The  algorithm  itself  shares  some  properties  with  the  incremental 
computations  in  the  digital  differential  analyzers  (DBA)  although  it  is 
not  based  on  an  integration  principle.   The  potential  of  such  an  algorithm, 
using  systematically  variable  powers  of  two  increments  rather  than 
constant  increments  as  in  BBA,  and  its  possible  application  in  implementing 
multiprocessing  systems  have  been  recognized  in  [CAM69a,  CAM69b,  CAM70]. 
Campeau  also  recognized  the  possibility  of  solving  a  linear  system 


iteratively  in  a  left-to-right  mode  using  a  DDA-like  configuration  but  not 
the  constant  increments. 

1.2  Dissertation  Overview 

The  main  result  of  this  work  is  presented  in  Chapter  2,  as  a  general 
computational  method.   In  its  exposition,  we  have  emphasized  the  basic 
principles  of  the  method  and  presented  those  details  which  we  found  to 
have  direct  implications  on  the  performance  of  the  method.   Several 
implementation  aspects  are  considered  in  Chapter  3.   The  physical  counter- 
part of  the  basic  computational  expression  of  the  method,  the  elementary 
unit,  is  defined  in  relation  to  a  graph  model  representation  of  the  entire 
method.   Certain  properties  of  implementation,  such  as  its  flexibility 
and  modularity,  are  discussed  in  sufficient  detail.   In  Chapter  h,    an 
attempt  is  made  to  illustrate  the  applicability  of  the  proposed  method  in 
various  problems.   For  example,  the  evaluation  of  polynomials  and  rational 
polynomial  functions  is  considered  in  general  and  as  a  basis  for  the 
evaluation  of  various  functions.   Basic  arithmetics  and  certain  types  of 
arithmetic  expressions  are  shown  to  be  compatible  with  the  proposed  method. 
Finally,  the  application  of  the  method  in  solving  the  systems  of  linear 
equations  is  considered  in  some  detail.  A  summary  of  the  results  and  a 
discussion  of  the  possible  implications  appears  in  the  final  chapter. 


2.   THE  EVALUATION  METHOD 

2 . 1  Introduction 

In  this  chapter  a  general  evaluation  technique,  named  E-method, 

is  introduced.   The  E-method,  in  general  terms,  can  he  described  as 
(i)  A  correspondence  rule,  C  ,  which  associates  independent 

variables  x  ,  dependent  variables  v_  ,  and  parameters  p_  of 
a  given  computational  problem  f(xf.^P_f.)  with  a  system  L  of 
simultaneous  linear  equations  ApV_  =  b  in  such  a  way  that 
there  is  a  one-one  correspondence  between  dependent  variables 
y  ,  i.e.,  the  results  of  f,  and  the  solution  y_  of  the  system  L. 
The  elements  of  the  matrix  A  and  vector  b  must  satisfy 
certain  conditions,  as  specified  later.   Symbolically, 

(cf  =>Af,bf )  =}  (£f  <4  y_  =  A^bf ) 

(ii)  A  computational  algorithm  for  solving  the  system  L  in  time 

linearly  proportional  to  the  desired  number  of  correct  digits 
of  the  solution  y_,  and  which  is  amenable  to  an  efficient 
implementation. 
A  computational  problem  f  is  said  to  be  L-reducible  if  there  is  a 
corresponding  rule  C  ,  not  necessarily  unique.   The  E-method  is  applicable 
in  all  L-reducible  problems:  the  computational  algorithm  remains  invariant 
while  the  particular  correspondence  rule,  no  more  complex  than  the 
assignment  of  values,  characterizes  the  problem. 
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The  choice  of  a  linear  system  as  the  target  of  correspondence  stems 
from  an  observation  that  the  expansion  of  an  n-th  order  determinant  has 
the  form  of  a  sum  of  n!  terms,  each  term  being  a  product  of  n  factors. 
Since  the  solution  of  a  linear  system  L  appears  as  the  ratio  of  the 
corresponding  determinants,  there  is  an  obvious  potential  to  represent 
and  accordingly  evaluate  certain  general  arithmetic  expressions,  rational 
functions  in  particular,  as  the  ratios  of  determinants  in  expanded  form. 

The  exposition  of  the  E-method  in  this  chapter  closely  follows  the 

order  in  which  the  fundamental  ideas  were  developed.   Thus  the  problem  of 

evaluating  rational  functions,  which  alone  is  of  sufficient  importance, 

will  be  used  to  introduce  and  demonstrate  the  correspondence  part  of  the 

E-method.   Its  correspondence  rule,  C  ,  will  be  defined  in  the  next 

R 

section  while  the  computational  algorithm  will  be  given  in  Section  2.7 
after  discussing  in  some  detail  what  appears  to  be  the  generic  problem 
for  the  E-method.   The  generic  problem  and  some  of  the  associated  concepts 
are  investigated  in  Sections  2.3-2.6. 

2.2  The  Correspondence  Problem  of  the  E-method 

A  simple  way  of  establishing  the  correspondence  C  between  the 

K 

coefficients  and  the  argument  of  a  given  rational  function  R   (x)  and 
a  system  L  of  simultaneous  linear  equations,  such  that  the  value  of  R    is 
computed  as  the  first  component  of  the  solution  vector  y_,  is  described 
as  a  general  example  of  the  correspondence  problem  of  the  E-method. 
Let  R   (x)  be  a  real- valued  rational  function: 

y  ± 

P  (x)     .Zn  Pi  X 

R   W  =  7TT-T  =  — (2-1) 

Z  q.  x 
ioO  1 


Without  loss  of  generality  it  is  assumed  that  q.n=l.   let 

A(x)  y_  =  b  (2.2) 

be  a  nonhomogeneous  system  of  n  simultaneous  linear  equations  with 

A(x)  -  (a.  .)  v  (2.3) 

-  the  nonsingular  system  coefficient  matrix; 

y  =  [y-^y^...,^]  (2.^) 

-  the  solution  vector  and 

b  =  [b1,b2,...,bn]  (2.5) 

-  the  right-hand  side  vector. 
Let  D(x)  denote  the  determinant  of  A(x): 

D(x)  =  det  A(x)  (2.6) 

Similarly, 

D  (x)  =  det^ag,...,^.   ,b,a  x,  ...,an)  (2.7) 


where 


a.  =  (a,  .,a0  ., .  ,.,a  .)  (2.8) 

-J     lj  2j'     no7 


is  the  j-th  column  vector. 
Theorem  2.1 


If  max(u,v)  <  n-1  and  the  coefficients  a. .'s,  b.  *s  of  the  system 

(2.2)  are  put  into  correspondence  with  the  coefficients  p.  '  s,  q. '  s  and 

the  argument  x  according  to  the  following  rule  C  :  ' 

R 


10 


r 


a.  . 
13 


^ 


-x 

0 


for  i  =  j; 

for  j  =  1  and  i  =  2, 3>...,V+1; 

for  j  =  i+1  and  i  =  l,2,...,n-l; 

otherwise; 


(2.9) 


p.     for  i  =  1,2,  .  ,.,n+l; 

0      otherwise, 


(2.10) 


then 


D1(x) 

yi(x)    =   -d[xT 


P  (x) 

TTT^  =  R   (x) 
Qy(x)      ^,vv 


(2.11) 


Proof: 


and 


By  the  Laplace  expansion  of  the  determinants 

n 

D(x)  =  £  a. ,  c...  (x) 

.  ,   ll  11 
1=1 


n 


D1(x)  =  Z     bi  ci:L(x) 
i=l 


where 


i+1 


Cil(x)  =  (-1)^  det  Ai3_(x) 


(2.12) 


:2.13) 


(2.1U) 


is  the  cofactor  of  the  element  a.,,  and  det  A.,(x)  is  its  corresponding 
minor,  defined  in  the  usual  way.   In  general, 


!±1w  -  < 


f    n 

k=2   ^ 


V. 


(-D 


i+1 


ri-1 


,nA,k+l 
lk=l 


n 
_k=i+l   J 


for  i=l; 


for  i=2,3> . .  .,n  . 


(2.15) 


n 


In  particular, 


and,  since 


for  1=1; 

c±1(x)  =  {  ^  (2.16) 

(-x)1"1    for  i=2,3,...,n 


/  n \i+lr    ni-1      i-1 

(-1)   [-x]    =  X 


it  immediately  follows  that 

n 


and 


Therefore, 


D(x)  =  E  a   c   (x)  (2.17) 

i=l 


v+1      .  , 
=  I  +  Z  a.   x1" 
i=2  tL~1 


=  1  +  Z  q.  x 
i=l 


Qv(x) 


n 

LJx)   =  Z  b.  c,(x)  (2.18) 

1      .  ,   l  ll 
1=1 


^       i-1 

=  *  Pi-1  X 

1=1 


=  Z  p.  x 
i=0  i 


=  P  (x)  . 


D  (x)     P  (x) 

yl(x)  =  "DlxT  =  qT*7  =  V(X) 


a 
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Theorem  2.1  establishes  the  correspondence  rule  CR  so  that  the 

E-method  can  be  applied  to  evaluate  a  given  rational  function  R   (x). 

The  correspondence  rules  for  several  other  representative  problems  will 

be  given  in  Chapter  h.      Figure  2.1  illustrates  how  the  system  L:  A(x)  v_  =  b 

appears  after  an  initialization  has  been  performed  according  to  the 

correspondence  rule  C  . 

R 

It  can  be  noted  that  the  correspondence  rule  C  is  degenerate  in 

n 

a  sense  that  only  one  of  the  n  generated  components  y.,  i  =  1, .  ..,n, 
namely,  y  is  of  interest. 

2.3  The  Generic  Problem 

The  proposed  computational  algorithm  of  the  E-method  conveniently 
appears  to  be  a  generalization  of  a  solution  to  a  rather  simple  problem. 
Namely,  the  problem  of  evaluating  a  linear  function,  subject  to  certain 
conditions,  may  be  considered  here  as  the  generic  problem  in  the  sense 
that  the  functional  properties  of  its  algorithm  are  preserved  in  the 
algorithm  of  the  E-method.  Moreover,  the  exposition  of  the  computational 
technique  for  the  generic  problem,  being  a  scalar  type,  proves  to  be 
straightforward  and  it  is  immediately  extendable  to  vector  type  parallel 
algorithms,  as  the  one  used  in  the  E-method. 

Consider  the  linear  function 

y  =  ax  +  b 

where  a  and  b  are  coefficients  and  x  is  the  argument.  While  y  could  be 
trivially  evaluated  with  the  help  of  any  multiplication  algorithm,  the 
following  set  of  imposed  properties  makes  the  problem  of  evaluating  y 
relevant  for  our  purposes. 
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Figure  2.1     Correspondence  Rule  C 
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Ik 


Property  1.   The  algorithm  must  generate  the  most  significant 
digits  of  y  first  in  such  a  way  that  once  generated,  digit  y. 

J 

at  the  step  j  will  not  be  affected  by  any  subsequent  step  k, 

k  >  J5 

Property  2.   The  basic  computational  step  should  be  invariant 

with  the  only  primitive  arithmetic  operation  being  addition; 

the  selection  procedure  which  generates  one  digit  of  the  result 

per  step  should  be  deterministic  and  feasible  on  a  limited 

precision  so  that  the  step  execution  time  is  independent 

of  the  length  of  operands. 

Property  3.   The  algorithm  should  have  an  "on-line"  capability 

with  respect  to  the  independent  variable.   Namely,  if 

{x.|i  =  1,2,... ,m}  are  the  digits  of  the  independent  variable 

x  then  only  the  digit  x.  need  be  used  at  the  (j+l)st  step. 

J 

The  first  property,  essential  for  the  computational  approach  of  the  E-method, 
provides,  in  a  simple  yet  effective  way  for  the  reduction  of  computation 
per  step  by  preventing  further  involvement  of  previously  computed  entities. 
It  implies  a  redundant  representation  of,  at  least,  the  result  and  it  also 
indicates  that  a  corresponding  algorithm  will  belong  to  the  class  of 
digit-by-digit  algorithms,  generally  known  to  provide  a  very  efficient 
and  elegant  basis  for  hardware-oriented  implementations.   Step-invariance 
and  the  deterministic  nature  of  the  algorithm  make  the  control  part  of 
implementation  certainly  very  simple,  while  the  requirement  that  addition 
be  the  only  primitive  operator  simplifies  the  operational  part  of 
implementation.   Furthermore,  the  property  allowing  a  limited  precision 
selection  provides  for  a  cost-effective  speed-up  of  the  algorithm  through 
the  use  of  a  limited  carry  propagation  mode  of  addition. 
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The  last  property,  although  easily  achieved  in  the  case  of  the 
generic  problem,  will  be  seen  as  essential  in  assuring  desirable  effects 
of  Property  1  in  the  general  algorithm  of  the  E-method. 

All  numerical  values  considered  here  are  assumed  to  be  represented 
in  a  finite  precision,  fixed-point  fractional  format,  with  a  representation 
error  | € |  <  r~  .   The  effect  of  representation  errors  on  the  E-method,  as 
discussed  in  Section  2.8,  is  minor  and  causes  only  a  slight  extension  of 
the  precision  required  to  represent  initial  data  with  respect  to  the 
prescribed  precision  of  the  result.   Thus  the  representation  errors  need 
not  be  of  immediate  concern  in  the  following  discussions.   It  will  be 
assumed  that  all  relevant  initial  data  have  the  precision  of  their 
representations  properly  adjusted  so  that,  for  given  m,  they  can  be 
regarded  as  exact. 

Definition  2.1;  An  m  digit  radix  r  representation  of  a  number 

x,  |x|  <  1,  is  a  polynomial  expansion 

m 
x  =  sign  x  •  Z  x.  r 
i=l  1 

where 

x.  £  D,    V  i 
l 

and  D  is  a  given  digit  set. 

Definition  2.2 :  For  a  given  radix  r,  a  set  of  consecutive 

integers  0  is 

i)  a  nonredundant  digit  set  if  its  cardinality  satisfies 

|D|  =  r 
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ii)  a  redundant  digit  set  if 

\D\    >  r 

Definition  2.3 :  A  symmetric  redundant  digit  set  is 
defined  as 


£>  =  {-p,-(p-l),...,-l,0,l,...,p-l,p) 
P 

where 


§  <  p  <  r  -  1 


assuming,  for  simplicity,  an  even  radix  r.   In 

particular,  £>  is 
P 

i )  minimally  redundant  if 


\D       =   r  +  1 

P 

so  that 


P  =2 


ii)  maximally  redundant  if 


0   =  2r  -  1 
P1 


so  that 


p  =  r  -  1  . 


Consequently,  the  representation  of  a  number  x  is  redundant  or 

nonredundant  depending  whether  x.eO  or  x.sfl.   In  the  case  of  a  redundant 

i  p     i 

representation 


sign  x  =  sign  x 


1 


m 


-l 


sox  -  Z  x.  r 
i=l  X 
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Example  2.1.   For  radix  r  =  k 

D  =   {0,1,2,3) 
Vin  "  (2,1,0,1,2) 
Dpmax  =  [3,2,1,0,1,2,3) 
where  the  overbar  denotes  the  negative  sign,  i.e.,  2  =  -2.   Then 

x  =  "23io  =  -113k  for  xi€D 

=  121  =  221  for  x.eD 

i     P 

=  H3  =  1313  =  133321  ...   for  x.efi  mflv  .  D 

1     pmax 


Theorem  2.2 


Let 


w(d)  =  d(d)  +  Z(J)  =  r(z(J-D  +  a  x(d-i)}  (2.19) 

be  the  basic  recursion  where 

j         is  the  recursion  index; 

r         is  the  radix; 

d  J JeD  is  the  j-th  generated  digit  of  the  result; 

z^J'      is  the  j-th  residual  such  that 

h(j)|  <t,   Vj, 
the  bound  £  satisfying 

\<t>  <i; 

a         is  the  given  coefficient  such  that 

|a|  <a, 
the  bound  a  satisfying 


0  <  a  <  - 

—   r 


\.^i] 
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and 

x    =  x.   is  the  j-th  digit  of  the  independent  variable  x. 
Then  the  following  selection  procedure,  defined  as  a  step- invariant  function 
s(w(j)): 


d(j)  =  s(w(tj))  = 


fsign  w(d)[|w(j)|+l/2j    for  |w(j)|  <p 


sign  w   [  |w  ^J        otherwise 


where  [wj  denotes  the  integer  part  of  w,  can  generate  in  m  steps,  given 
z    =  b,  a  sequence  of  digits  (d^   ,  d   ,  ...,d   )  such  that 

y  -  y   <  r 


where 


*   m   (i)  -i 
y  =  Z  duy  r  J 


3=1 

and 

y  =  a  Z  xU;  r~3   +   b  . 
0=1 

Proof: 

The  selection  function  s(w   ),  for  practical  purposes,  represents 
a  rounding  procedure,  modified  at  the  endpoints  of  the  domain  to  avoid 
digit  values  |d   '|  >  p.   It  simply  maps  a  w-subinterval  [k-l/2,  k+l/2 ) 
to  an  integer  k,  the  subinterval  boundaries  being  assigned  as  close 
or  open  as  convenient. 

The  consistency  of  the  recursion  formula  (2.19),  with  respect  to 
the  selection  function  s(w   )  is  established  by  proving  inductively 
the  boundedness  of  the  residuals  {z^').   By  the  condition  of  the  theorem: 
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Then,    assuming 


s(J_1)l   H, 


it  follows  that 

|w(o)|  <r|z(J-D|  +  rlax^-1^ 

<r5+r[i(l.tfe=llJ|p 

=  P  +  I 

By  definition  of  the  selection  function  s(w   ),  the  choice  of  digit  d^  ' 
can  always  be  made  such  that 

|z(^|  =  |w(^  -  d(j)|  <^  , 

as  desired. 

To  show  the  convergence  of  the  corresponding  algorithm,  it  may  be 
first  established,  by  substitution,  that  the  k-th  residual  satisfies  the 
following  equality: 

z(k)  =  rk[b  +  a  Z  x^  r^  -  E  d(j)r"d],   k=l,2,... 

By  definition 

m   /  .  x 
y  =  a  x  +  b  =  a  Z     xu;  r"°  +  b 
0=1 

and 

*    m   (i)  -1 
y  =  I,  dy ;  r  J 

0=1 
Therefore 


*    -m,  (m)      (m)x 
y  -  y  =  r  (zv  7  +  a  xv  ') 
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so  that 


*i  ^  _m/ 1  (m)  i   in  (ra)  i  \ 
y-y  I  <r   (|zv  J\    +|a|  |xv  J\  ) 


-m 


<r  (5  +a  p) 


.  -m 
<  r 

since 

{;  +  a  p  =  -^  <  1  for  5  <  1,  p  <  r  -  1  D 

It  may  be  noted  that  it  takes  (m+l)  steps  to  form  all  2m  digits 

of  y  since 

v  j(J)  -D    -m-1  (m+l) 
y  =  £  d    r   +  r     z 

0=1 
and  adding  the  last  residual  can  be  done  by  concatenation.   One  extra  step 
results  from  the  way  in  which  the  recursion  formula  (2.19)  was  defined. 
Theorem  2.2  indicates  precisely  an  algorithm  having  all  of  the 
required  properties.   It  does  not,  however,  indicate  explicitly  how 
addition  in  the  recursive  formula  can  be  replaced  by  a  limited  carry 
propagation  addition  so  that  the  same  simple  selection  function  s(w   ) 
will  remain  applicable.   For  that  purpose  the  selection  problem  and  the 
relationships  between  £,  a  and  p  bounds  need  to  be  analyzed  in  more  detail. 

2.k     An  Analysis  of  the  Selection  Problem 

The  selection  function  s(w)  can  be  interpreted  as  a  conventional 
rounding  rule,  conforming  to  the  given  conditions  on  £,  a  and  p  bounds. 
Therefore  the  digit  selection  process  itself  can  be  carried  out  in  a 
deterministic  fashion:  no  tests  need  to  be  performed  since  the  result  of  the 
selection  is  exactly  the  integer  part  of  the  rounded  >rJ'.  This  proved  to  be 
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a  particularly  practical  way  of  performing  selection  when  a  higher  radix 
is  utilized  [ERC73].  While  rounding  itself  needs  no  further  elaboration, 
a  more  detailed  analysis  of  this  selection  problem  appears  useful  in 
demonstrating  that  the  basic  recursion  can  be  performed  in  time  independent 
of  the  length  of  the  operands. 

The  selection  procedure  is  considered  here  to  be  defined  by  a 
function 

s  :  W  -  D  (2.20) 

P 

where  the  domain 

W  =  {w(j)|w(j)el  =  [-  p  -  £,  p  +  £])  (2.21) 

is  the  finite  set  of  values  w  ,   defined  by  the  recursion  formula  (2.19), 

and  the  range  D     is  a  redundant  digit  set. 
P 

Let 

ik-  Uk,V  '  kGDP  (2-22) 

be  a  subinterval  of  I,  with  lower  and  upper  endpoints  JL    and  u,  such 


that  if 


then 


»(Jk 


d(d)  =  s(w(d))  =  k 
is  a  valid  digit  choice. 

For  the  continuity  of  the  domain  W,  the  overlap  between  the 
adjacent  sub  intervals,  defined  as 


\  -  \  -  h+i '  ieDo  - (p)  (2-23) 
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must  satisfy 

Aj_  >  -  r"m  (2.210 

assuming  m  digit  precision  of  the  w    representation.   The  linear  form  of 
the  recursion  formula  and  the  selection  function  s(w   )  imply  that 

Aj_  =  A  ,  Vi  (2.25) 

Since  the  selection,  in  principle,  is  performed  by  comparing  w 
to  a  fixed  set  of  interval  breakpoints  -  comparison  constants  {c.}  as 

fk  cn    <  w^  <  cn     . 

/  .  x  k  -  k+1 

dUj   =  /  (2.26) 

Ul  ck+1<w^)<ck+2   , 

it  is  important  to  maximize  the  overlap  A  between  subintervals  so  that  a 
convenient  choice  of  comparison  constants  as  "simple, "  low  precision 
numbers  (e.g.,  l/2,  l/h,    etc.)  can  be  made.   Moreover,  the  precision  to 
which  w  J'  must  be  computed  for  selection  in  order  to  be  correct,  is  of 
the  same  order  as  the  precision  of  the  overlap  A.   Thus  a  sufficiently 
large  overlap  can  make  the  time  necessary  to  perform  the  recursion 
independent  of  the  precision  of  the  operands.   Let  the  comparison  constant 
c   satisfy 

°k  -  Vl  +  f  "  \  "  f  (2-27) 

Then,  instead  of  performing  selection  as  in  (2.26),  it  can  be  carried  out 


as 


~  i) 
k      c,  <  w^d;  <  c.  , 
,.>.  k  -        k+1 

d(j)  =   <  (2.28) 

k+1    c.  ,  <  w^'  <  c.  0 

k+1  -        k+2 
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|w(j)  -  w(j)|  <f 


provided  that 


where  w    is  computed  by  any  limited  carry  propagation  technique.   To 
satisfy  (2.29),  the  carry  needs  to  be  assimilated  only  in  ~|%  |a|  |  most 
significant  positions. 

The  effects  of  the  overlap  A  on  the  considered  selection  problem 
can  be  summarized  as  follows : 

i)  If  A=0  or  A=-r"  ,  the  full  precision  value  of  w  ^  must  be 

used  in  the  selection  even  though  the  comparison  constants 

are  simple,  i.e., 


(2.29) 


ck  =  k-2-,   keDp 
Furthermore,  to  satisfy  the  consistency  requirement  of  the 
recursion  formula  (2.19),  the  following  must  hold 


(2.30) 


c  -  k  I  <  t,     ,         V  keD 

K  p 


The  domain  continuity  implies 


(2.31) 


c  fc  _  r"m  <  £ 

k+1  -  b 


(2.32) 


so  that  the  residual  bound  must  satisfy 

-m 


£  > 


1-r 


c  >- 


if  A  =  -r 


if  A  =  0 


-m 


(2.33) 


This  establishes  the  lower  upper  bound  on  the  residuals  with  respect 
to  the  defined  selection  function.  Such  a  result  is  intuitively 


clear  since  the  value  of  the  lower  upper  bound  on  residual  z 


(J) 


cannot  be  less  than  one  half  of  the  smallest  positive  digit  d 
value  if  the  rounding  is  to  be  used.   The  selection  function 
s(w   ),  in  the  case  where  the  full  precision  is  used,  is 


(j) 


2k 

illustrated  in  Figure  2.2.   The  z-w  graph  is  analogous  to  the 
SRT  division  chart  [ROB58]. 
ii)  If  the  overlap  A  >  0,  the  benefits  of  a  precision  independent 
speed  can  be  introduced  in  a  rather  simple  way.   Namely,  by 
redefining  the  residual  bound  £  to  satisfy 

\   (1+A)  <  t,   <  1  (2.3*0 

for  the  given  overlap  A,  the  same  comparison  constants  {c.},  and 
hence  the  selection  function  s  as  in  the  full  precision  case, 

may  be  retained  while  using  a  low  precision  selection  argument 

~(i)  (i) 

w    rather  than  w   .  An  example  of  the  corresponding  z-w  graph 

with  an  overlap  A  =  1/2  and  the  residual  bound  £  =  3>/h   is  given 

//s(  i )  \ 
in  Figure  2.3.   For  the  selection  function  s(w  °    )   to  satisfy 

s(w(^})  =  s(w(j'}) 

i.e.,  to  work  correctly,  w    must  be  computed  so  that 


w 


(J)  -  w(j)|  <l/k   , 


a  trivial  constraint  indeed. 
The  potential  for  a  computation  time  independent  of  the  precision  of 
operands  is  implicit  in  the  recursion  formula  (2.19):   it  requires  a  rather 
superficial  change  in  the  bound  £  and  an  appropriate  representation  of  w 
allowing  for  the  simple  calculation  of  w  °    . 

2.5  The  Relationships  Between  Qi,    ^   and  p  Bound 

It  can  be  observed  that  there  is  a  strong  interdependence  between 
the  bounds,  en  on  the  coefficient  a,  ^  on  the  residuals  {z^'j  and  the 
maximal  value  p  of  the  digit  set,  by  definition.   The  values  chosen  for 
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these  bounds  largely  determine  the  basic  computational  properties  of  the 
generic  algorithm. 

The  bound  Oi  on  the  coefficient  a, 

0<  a  <±  [1  -  ^r'1)]     <i,  (2.35) 

—  r  p  r 

is  defined  in  this  particular  way  so  that  the  consistency  requirement  of 
Theorem  2.2  can  be  easily  satisfied.   It  immediately  follows  that 


since 


C<^x  (2.36) 


a  >  o  . 


Also,  because  p  <  r-1, 

!<£<!*  (2-37) 

the  lower  bound  on  £  being  established  earlier  on  the  basis  of  the  selection 
function  consideration  (2.33).   Furthermore,  when  the  selection  subinterval 
overlap  A  is  used,  £  bound  is  increased  by  a/2,  thus, 

A<|^-1  (2.38) 

x 

which,  for  a  minimally  redundant  digit  set  D  ,  with  p  =  p,  becomes 

A<^-  (2.39) 

and  for  a  maximally  redundant  digit  set,  where  p  =  r-1, 

A<1,  (2.  to) 

confirming  a  generally  known  property  that  more  redundancy  in  the 
representation  system  provides  for  a  faster  and  easier  computation. 
The  bounds  a   and  £  determine  required  scaling  for  arbitrary 
coefficients  a  and  b.   It  may  be  noted  that,  again,  increased  redundancy 
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reduces  scaling  although  not  significantly.   The  scaling  presents  no 
problems  in  the  generic  algorithm  but,  in  general  it  would  be  desirable 
to  maximize  the  bound  en.   However,  since  a  <  — ,  there  is  little  justification, 
not  at  least  in  a  practical  sense,  to  constrain  £  or  p  in  order  to  increase  Oi. 

A  summary  of  the  relationships  among  the  bounds,  with  respect  to 
the  operating  modes  of  interest,  is  given  in  Table  2.1. 

2.6  The  Algorithm  G 

Although  Theorem  2.2  itself  specifies  the  algorithm  for  solving 
the  generic  problem,  this  will  be  formalized  again  here  as  the  Algorithm  G 
for  reference  convenience. 


Algorithm  G 


/  Initialization  / 

1.  z^~b;x(0)~0; 
/  Recursion  / 

2.  for  j  =  1,2,  ...,m: 

2.1  w(j)  ^r(z(J-l)  +  ax^1)); 

2.2  d(d)  -  s(w(j)); 

2.3  zU)  ^w(j)  -  d(j); 
/  Termination  / 

HALT 
I   The  result  is  y  =  Z  dU;  r"J  / 


□ 


Table  2.1  Relationships  Between  Bounds 
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Redundancy  in  0  : 
P 

Mode  of  Operation: 

1 

Minimal 

Maximal 

r 

P  =  2 

p  =  r  -  1 

Operand 

Precision  Dependent 

^   2 

-  2 
r 

a^k 

A  =  0 

Operand 

Precision  Independent 

e  =  |  (i+A) 

0  <  A  <-^r  "  1 
r-1 

a  <~[1-A(r-1)] 
r 

a  <gjj[l-A] 
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Example  2.2 

Compute  y  =  ax  +  b  using  the  Algorithm  G  for 
a  =  0.00101011       r  ==  2 

b  =  0.01011001 

P 

x  =  0.10111001       m  =  8 


Dp  =  {1,0,1} 


U) 


w 


(d) 


(j) 


(d) 


1 

1 

0.1011001 

1 

-o.oiooni 

2 

0 

-0.0100011 

0 

-0.0100011 

3 

1 

-o.iooono 

1 

0.0111010 

U 

1 

1.0011111 

1 

0. 0011111 

5 

1 

0.1101001 

1 

-0.0010111 

6 

0 

-0.0000011 

0 

-0.0000011 

7 

0 

-0.0000110 

0 

-0.0000110 

8 

1 

-0.0001100 

0 

-0.0001100 

The  computed  solution  is 

8 
Z 


y*  =    z    a^  2"J"  =  o.iomooo  =  o.oiiiiooo 


while  y  =  0.0111100000010011.      It  can  be   easily  verified  that 

*  (8)  0-8  (8) 

y  =  y     +   zv    '  2        +  axv    J    . 


□ 


2.7  The  Computational  Algorithm  of  the  E-method 

The  computational  part  of  the  E-method  consists  of  a  unique 
algorithm  for  solving  the  system  L,  defined  for  a  particular  problem  f 
by  its  correspondence  rule  C  .   By  definition,  the  generated  solution  of 
the  system  L  is,  in  its  totality  or  in  part,  also  the  solution  of  the 
original  problem  f.   This  algorithm,  referred  to  as  Algorithm  E,  is  a 
direct  generalization  of  the  generic,  scalar  Algorithm  G  for  an  n 
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dimensional  vector  space,  n  being  the  order  of  the  system  L.   It  retains 
all  of  the  important  properties  of  the  generic  algorithm:  it  also  belongs 
to  the  class  of  digit-by-digit,  left-to- right  methods  and  generates  the 
solution  in  at  most  m+1  recursive  steps,  if  supported  by  a  configuration 
of  n  identical  elementary  units.   However,  the  Algorithm  E,  allows  in  a 
straightforward  manner  for  compromises  between  speed  and  implementation 
complexity.   Furthermore,  the  selection  function  s  and  the  bounds  a,  £  and 
p,  as  defined  in  the  generic  problem,  remain  applicable. 

Let  3  =  1,2,..., m+1  denote  the  recursion  index.   Define  the  j-th 
digit  vector  d^'  as 


d(j)  =  Cdp^d^,...^')]  ;  (2.1a) 


the  j-th  residual  vector  z    as 


z(j)  =  [z[i\z^\...,z^h  (2.142) 


and  their  vector  sum  w  ^  as 


w 


(o)  a  4<J)  +  s(d)  =  [^)f^)fmtt^U)1  {2M) 


where 

r(j)  _  Ad)    .    M) 


w£"  =  <±lJJ  +  zf  ,        i  -  1,2,. ..,n  ,   Vj 


Let  s(w   )  be  the  vector  selection  function 


;(w(j'})  =  [s(wp)),s(w^)),...,s(w^'))]  (2.kh) 


such  that 


djj)  =  s(wp))  ,   i  =  1,2,. ..,n,  Vj 
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and  each  s(w.   )  is  defined  as  in  Theorem  2.2  of  the  generic  problem. 
Furthermore,  define  the  n-th  order  matrix  G(x)  as 

G(x)  =  I  -  A(x)  =  (gid)nXn  (2.^5) 

where  I  is  the  identity  matrix  and  A(x)  is  the  coefficient  matrix  of  the 
system  L  defined  by  the  correspondence  rule  C-.   The  maximum  vector  norm 

llxll   =  maxlx.  I  (2.1+6) 

i 

and  the  consistent  matrix  norm 

n 
||a||     =  max   (  Z    la.  .  I  ),  (2.1+7) 

II IIqq  11 

i       3=1 

as  the  only  norms  considered,  will  be  denoted  as  ||x||  and  ||a||,  respectively. 

Theorem  2 . 3 


If 


|G(x)||<a;  (2.1+8) 

:  i;  (2.1+9) 


and 

d'^  e  9  ,  V.  V. 
i      P   i  3 

then  the  following  n-th  order  system  of  linear  recursions 
„(5)  =  ^(j)  +  £.U)  =  r(^-Df  G(x)  dU"l})  , 

3  =   l,2,...,m+l 
with  the  initial  conditions 

a<°>  -  o  , 


(2.50) 
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generates  m  leftmost  correct  digits  of  the  solution 

y_  =  A'^x)  b 
in  at  most  m+1  steps  as  the  sequence  {d   }>  d/J  ,  V    selected  according 

J 

to  (2.kk).      Namely,  the  generated  solution 

m+1   /.\   .    m+1   /.x   .       m+1   /.v 
Z  dU;r'J  =  [  L     cLUJr~J,  ...,   Z  d^;r~J]         (2.51) 


0=1  J=l  0=1 


satisfies 


k-Z*ll<r-m  .  (2.52) 


Proof: 

The  consistency  of  the  system  of  recursions  with  respect  to  the 
selection  function  (2.kh)   is  proved  inductively.   By  the  statement  of  the 
theorem, 


Assume  that 


Then 


l*(0)ll  <  6 


\z^-^\\  <  5 


|w(d)||  =  ||d(d)  +  z(J}||  (2.53) 

=  r||z(j'-l}  +  G(x)  d^"1^ 

<  rfe^-1^   +  rljGC^)!!-      II 

<  r  (;  +  r  a  p 

„    rl,_   t(r-l)-, 

=  r  5  +  r[-(l  -  «- — <•]  p 

=  P  +  ^ 


3h 


Since 


^U)   _  yU)   _   ^(J)  g^  sign  dU)  =  sign  VU)   f 
by  definition  of  the  selection  function  s(w   ),  it  immediately  follows  that 

H£(i,)ii  <  5  • 

The  convergence  is  proved  by  showing  that  the  solution  error  vector 
h  =  [h1,h2,...,hn]  =  y_  -  /  (2.54) 


satisfies 


|h||<r"m.  (2.55) 


After  m+1  steps  the  following  holds: 


m 


, (m+l)  (m+l)         m+1.         _.,    \r   v      -.(j)  m+1- j  -,  /0    _,-,. 

dv        '    +  zv  =  r         b   +  G(x)[   Z     cTd/r  d]  (2.5b) 

3=1 


m 


-I[Z     d^rm+1^] 

3=1 


or 


rri 


or 


Let 


-m-l/,(m+l)  (m+l)x       ,         .  /    \r   v     j(j)  -J*i 

r  (d/        '    +-  zv        ')   =  b   -  A(x)[   Z     d      yr      ] 

3=1 


-m-l      (m+1)       ,         .  /    x     *        _,/    v    , (m+l)   -m-1  /0   _„> 

zv        '   =  b  -  A(x)«y     -   G(x)   dv         yr  .  (2.57) 


e_(m+l)   =    [e[m+1\e^+1\...}e^+1h   -    (z(m+l)+G(x)d(m+l)  ^^   (2.58) 
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Since  y  =  A   (x)  b, 

fe-E-Z  =  A_1(x)-(-  e(mHl))  .  (2.59) 

The  error  after  mi-1  steps  is,  therefore,  hounded  by 

||h||  =  llA^txJt-  e(m+1))||  (2.60) 

<  IIa-^x)!!.!^1'!!  . 

Since  ||g(x)||  <  OS  <  1  by  definition  for  r  >  2,  the  matrix  G(x)  is  convergent, 
i.e., 

lim  [G(x)]P  =  0 

p-»oo 

By  the  well-known   result    [FAD63]: 

A_1(x)    =    [I  -  G(x)]"1  =   lim    (I   +  G(x)    +  G2(x)   +    ...    +  GP(x))      (2.6l) 

p-*=o 


so  that 


|A_1(x)||    =    ||I   f   G(x)    +  G2(x)    +    ...||  (2.62) 


<   ||I ||    »■    ||G(x)||    +    ||G(x)||2    + 


<  1  4-    z    a9 


P=l 


1-a  -  3 


Also 


I    (m+l)n  -m-ln    (mi-l)        „ ,    s    ,  (m+l)n  /_   ^„^ 

\e  J\\   =   r  ||zv         '    +  G(x)   d^         y||  (2.63) 


<  r  (£-K*p)     . 
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Therefore/ 


lull    ^     1  C^P  -m  ~m  ^o   /ri,  \ 

h     <  -z —  •   ^ — ^  *    r       =  7«r  (2.64) 

'-"  -  l-a  r  '  ' 


where 


/ 


2l^TT  <2-65 


for  minimally  redundant  digit   set  and 


7=\  (2.66) 

for  maximally  redundant  digit   set.      Since  y  <  1  for  r  >  1 

111   II   ^  "M  r-l 

||h||  <  r  D 

Theorem  2.3  completes  the  general  specification  of  the  computational 
part  of  the  E-method.   Clearly,  the  recursion  formula  (2.50)  may  be 

computed  in  time  independent  of  the  operands  precision,  provided  that  the 

~(i )  (i ) 

approximate,  low-precision  value  w  °  '    of  the  sum  w    satisfies  the 


selection  requirement 


|w_(j)  _  £(j)||  <|  (2<67) 


where  A  is  the  given  overlap  between  the  selection  subintervals,  as  discussed 
in  the  generic  problem. 

It  may  be  seen  from  the  recursions  (2.50)  how  Property  1,  one -step 
dependent  generation  of  digits  {d   },  and  Property  3>    on-line  restriction 
on  the  usage  of  digits  {d   },  as  introduced  in  the  generic  problem,  are 
critical  for  simplicity  and  speed  of  the  Algorithm  E:   not  only  is  the 
computation  of  the  recursion  formula  reduced  to  additions  but  the  effective 
delay  between  two  successive  applications  of  the  recursion  formula  is 
equivalent  to  the  time  necessary  to  generate  just  a  single  digit.   Some 
implications  of  these  properties  on  the  complexity  and  possible  speed-up 


of  computations  will  be  discussed  later.  Note  that  the  argument  of  the 
original  problem  f  appears,  after  initialization,  as  a  parameter  in  the 
system  of  recursions  (2.50),  and,  at  a  particular  step  j,  only  dVJ   is 
considered  as  the  independent  variable.   The  rate  of  convergence  of  the 
Algorithm  E  is,  by  definition,  restricted  in  that  it  must  be  linear  with 
respect  to  the  radix,  i.e.,  one  radix  r  digit  per  step.   However,  the 
resulting  computational  simplicity  of  the  recursive  formula  and,  equally 
important,  the  effective  way  of  introducing  parallelism  more  than  compensate 
for  such  a  restriction. 

The  error  properties  and  the  scaling  problem  of  the  E-method  will 
be  discussed  next,  while  a  summary  of  the  E-method  with  an  example  concludes 
this  chapter.   Certain  aspects  of  the  E-method  related  to  the  implementation 
and  performance  evaluation  will  be  considered  in  the  following  chapter. 

2.8  An  Error  Analysis  of  the  E-method 

The  error  behavior  of  the  E-method  appears  to  be  quite  favorable. 
First,  it  can  be  seen  that  if  the  system  of  linear  equations  L  satisfies 
the  bounds,  as  required  by  Theorem  2.3,  then  it  is  insensitive  to  small 
perturbations  in  elements  of  A(x)  and  b  since  the  corresponding  condition 
number  of  the  matrix  A(x)  satisfies 

k(A(x))  =  ||A(x)||-||A-1(x)||  (2.68) 

1-KX 

l-a 

<-l  ■ 

Second,  by  definition  of  the  computational  algorithm,  no  roundoff  errors  are 
generated  when  E-method  is  applied.   However,  the  finiteness  in  the 
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representations  of  the  elements  of  A(x)  and  b  inevitably  introduces 


representation  errors  which  will  systematically  propagate  to  the  left  and 
eventually  invalidate  selection  of  digits  {d.   }  and  hence  destroy  the 
accuracy  of  the  result.   In  view  of  (2.68),  it  is  clear  that,  by  considering 
representation  errors  as  perturbations  of  the  correct  values  of  the  elements 
of  A(x)  and  b,  no  serious  difficulty  should  be  expected.   Indeed,  it  will 
be  shown  that  the  ill-effects  of  these  representation  errors  can  be 
compensated  for  by  an  extended  precision  m'  in  the  coefficients  representa- 
tions with  respect  to  the  prescribed  precision  m  of  the  result. 

To  determine  m'  >  m  such  that  the  first  m+1  digit  vectors  d    are 
correctly  selected,  assume  that 

3(x).  G(x)  +  ^(x)  .  (g.  .)QXn  ♦  (e.  .)nxn  .  (I..)nXn     (2.69) 

represents  the  matrix  of  exact  elements  {g. .)  while  G(x)  is  their  finite 

precision  representation  matrix  with  the  corresponding  error  matrix 

E  (x)  such  that 
— G 

-m' 


Similarly, 


and 


e.  .|  <  r   ,  V.  V.  (2.70) 


~(j)  =  W(J)  +  e(j)  (2.7i) 

—      —      -^w 


~(J)  =  z(j)  +  eU)  (2.72) 

—      —      -z 


Since,  by  definition  of  Algorithm  E 

w(3)  _  d(a)  „  Z(J)  }   v 
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then 


or 


~Q)_e(j).d(j)  =  2(o) 

iw.ii 

1 


~(j)  =  2(d)  +  e(d)  =  (d)  +  e(j) 

i      1      w.     1      z. 

i  l 


so  that 


e(j)  =  e(j)   and   e(^  =  e(j)  (2.73) 

z.     w.         — z     — w 
1      1 

By  substitution,  applying  (2.73),  it  follows  that 

(m+1)  =  rm+l   (0)       J   (j)  -j-,  (   lf) 

-w  — z     -G  .  ,  —  ' 

For  the  selection  to  be  correct,  the  discrepancy  between  the  true  and  computed 
selection  argument  must  satisfy 

||e(m+l)n  =  n~(m+l)  _   (mfl)n  <  £  (    } 


Since 


and 


l4md)H  <  -m+1(ll40)ll  -  11^(^)11-11  £  l(j)r-J||) 

0=1 


l40)H  ■  *,  <  '""  ; 


||E  (x)||  <  max  Z  |e.  |  ;  (2.76) 

~G     "  i  d=l  1J 

m    /  .  n 
||  Z     dU;  r"J||  <  1  , 

d=i 

it  can  be  found  that  the  precision  of  coefficient  representation  must  satisfy 

m'  >m  +  1  +  %  (|  n')  (2.77) 

—         *r  A 


1+0 

where  n'  is  the  maximum  number  of  possible  nonzero  elements  in  any  row  of 
the  matrix  A(x).   In  the  most  important  applications  considered  here,  n1 
is  very  small.   For  example,  in  the  case  of  the  rational  function  evaluation 
problem,  n'  =  3  so  that,  for  r  =  2  and  A  =  1/2,  m'  >  m  +  1  +  fog.   12  ~  m  +  5. 

2.9  The  Scaling  Problem 

The  conditions  (2.1+8)  and  (2.1+9)  of  Theorem  2.3  on  norms  of  matrix 
A(x)  and  vector  b,  imply  that,  in  general,  an  adjustment  of  the  size  of 
the  elements  a. .  and  b.  will  be  required.   Although  scaling  commonly  appears 
whenever  fixed-point  representation  arithmetic  is  used,  it  can  be  handled 
without  serious  difficulties.   The  E-method,  however,  requires  more 
consideration  of  the  scaling  problem. 

The  scaling  problem  of  the  E-method  will  be  considered  here  in 
general  terms,  with  specifics  to  be  given  later  for  particular  applications. 
The  simplest  problem  in  scaling  which  may  arise  is  that 

||A(x)||  <  1  +  a  (2.78) 

but 

INI  1 5 

However,  instead  of  solving 

A(x)  y  =  b 

one  can  solve  an  equivalent  system,  scaled  as  follows 

A(x)  S  y  =  S  b         _  (2.79) 

or 

A(x)  y»  =  b1 


kl 


where 

lib '  II  =  lis  b ||  <  c 


The  scaling  matrix  S  =  (s. .)  ..   is  defined  as 

—     ij  n  Xn 

-cf 

r      for  i=j 


s. . .  <;  (2.8o) 

0      for  i^  j 


where  a  is  a  positive  integer  such  that 

r""ff||b||  <  S       . 

or 

11*11 

a  =    ^r  T"^ 

Clearly,  in  order  to  retain  the  same  number  of  significant  digits  in  y' 
as  in  y,    one  must  carry  a  extra  steps.   Then 

y  =  s"1  y' 

Multiplications  with  S_  and  S   involve  only  a  shift  of  a   positions  right 
and  left,  respectively.   Therefore,  the  case  (2.78)  is  always  trivially 
solvable  and  one  need  be  concerned  only  with  the  case  when  matrix  A(x) 
requires  scaling. 

The  problem  of  scaling  the  matrix  A(x)  may  be  considered  equivalent 
to  a  problem  of  transforming  a  given  matrix  A,  with  arbitrarily  large 
elements  a. .,  into  a  diagonally  dominant  matrix  A  such  that 

a..   >  k  a .  .   ,   vijt  1 


1+2 

k  is  a  given  constant.   This,  in  general,  requires  prohibitively 

complex  computation  in  view  that  A(x)  depends  on  independent  variables  of 

the  original  problem.   For  example,  consider  a  general  technique  which 

can  be  used  to  reduce  a  given  system  of  linear  equation  to  a  form,  convenient 

for  iteration  [DEM73].   By  definition,  in  our  case,  det  A(x)  /  0  so 

(A_1(x)  -  S)  A(x)  y  =  (A_1(x)  -  S)  b  (2.8l) 

or 

y  =  S  A(x)  y  +  (A_1(x)  -  S)  b 

where  S_  is  the  scaling  matrix  (2.80),  being  defined  so  that 

||s  A(x)||  <  l  +  a 

The  form,  although  slightly  different  from  that  of  Algorithm  E,  is  acceptable 
and  matrix  A(x)  has  been  effectively  scaled.   However,  computation  of 
(A  (x)  -  S)  b  would  require  an  amount  of  work  inc omen sur able  with  the 
expected  performance  of  the  E-method.   There  are,  fortunately,  many 
important  applications  where  scaling  of  matrix  A(x)  does  not  appear  to  be 
a  problem.   It  is  of  practical  importance  that  certain  problems  allow  for 
redefinition  in  a  convenient  way  so  that  scaling  again  is  simple  matter. 
With  respect  to  scaling,  two  classes  of  applications  of  E-method 
can  be  distinguished.   In  one  class  there  are  problems,  for  instance,  the 
evaluation  of  functions,  where  all  parameters  are  known  a  priori  so  that 
scaling  is  a  one-time  job  for  a  particular  argument  range.   In  this  class, 
the  E-method  applies  without  any  overhead.   In  the  other  class  are  the 
problems  with  arbitrary  parameters  so  that  scaling  must  be  performed  each 
time  the  E-method  is  applied.   This  involves  an  overhead,  commonly 
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encountered  in  any  other  method  of  computation  in  a  fixed-point  representation 
domain.   The  applications  of  the  E-method  in  this  class  would  make  the 
extension  of  the  method  to  a  floating-point  representation  domain  highly- 
desirable.   It  may  be  noted  that  the  E-method  is  easily  applicable  when  a 
block- floating  point  representation  is  used. 

2.10  Summary  of  the  E-method 

The  E-method  is  summarized  below  for  reference  convenience: 

Part  1  ( C  or  r  e  sp  ond  enc  e ) 

Given  an  L-reducible  computational  problem  f,    apply  the 

correspondence  rule  C  on  the  arguments  and  parameters  of  f 

in  order  to  obtain  the  coefficient  matrix  A  and  the  vector 

b  of  the  system  of  linear  equations  L.   The  correspondence 

rule  C  must  guarantee  that  the  elements  of  A  and  b  can 

be  made  conformable  to  the  conditions: 

n 

E  la.  .  I  <  a   .        V  i 

3-1  1J  ~ 

Part  2  (Computation) 
Algorithm  E 
/  Initialization  / 

1.  z(0)  <-  b;  G(x)  -  I-A(x);  d(°to; 
/  Recursion  / 

2.  for  3   =  1,2,  ..  .,m+l: 

2.1  w^^r(z(d~l)  +  G(x)  d(j_l)); 


1+1+ 

2.2  d(j)*-s(w(j)); 

2.3  z(Jlw(J}  -  d(j); 
/  Termination  / 

3.   HALT 
/  The  result (s)  of  f,  for  the  given  precision  m,  are 
represented  by 

*   m+1  (i)  -i 
y  =  Z  du;  r  J 

0=1 

Example  2.3: 

As  a  general  example  of  the  E-method  we  present  the  evaluation  of 
R  k(x)  as  an  approximation  to  sinh(x),  xe  [0,1/8],  with  a  precision 
of  13  decimal  digits.   The  coefficients  are  taken  from  [HAR68, 
SINH2002,  p.  10l+,  p.  216],   Before  normalizing  q  to  1,  they 
appear  as  follows : 

pQ  =  0.0 

P-,   =  0.5353890l+56o87786*l03 

P2  =  0.0 

p     =  0.56l+627l+506878l+9*l02 

qQ  =  o.5353890l+56o879l+*lo3 

qx  =   0.0 

q      =  -0. 32769^3311233^  102 

q3   -   0.0 

%  =   i'0 

The   other  parameters  are:      r  =  2,    n  =  5,    £  =  3/h,  a  =   l/8,    a     =  1, 


m 


1    =  kk  +  1  +  1  =  k6.      For 


^ 


x  =  0.101973^533301, 


the  evaluation  is  illustrated  in  Figure  2.k.      The  generated  value 


* 
y,  satisfies 


(sinh(x)  -  y*)/sinh(x)|  <2  5  .  D 


k6 
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3.      ON   DEPIGMENTATION  AND   PERFORMANCE 
OF  THE  E-METHOD 


3.1  Introduction 

We  introduce  in  this  chapter  a  definition  of  the  basic  computing 
"block,  the  elementary  unit  EU,  and  consider,  in  some  detail,  its 
implementation.   Then,  a  graph  model  of  a  general  computing  configuration, 
constructed  by  connecting  several  elementary  units,  is  defined  as  a 
convenient  form  of  representation,  exhibiting  directly  the  basic  parameters 
which  determine  the  performance  of  the  E-method.   Finally,  we  discuss 
several  modes  of  operation,  in  which  the  E-method  can  be  applied,  in 
relation  to  the  general  implementation  properties  of  the  method. 

3.2  The  Basic   Computational  Block 

The  basic  computational  block,  the  elementary  unit  EU.,  is  a 
hardware  structure  implementing  the  basic  recursion  formula  (2.50)  or, 
more  precisely,  the  recursion  step  of  the  Algorithm  E.   Due  to  its 
simplicity,  it  seems  sufficient  to  indicate  only  the  major  parts  and  a 
control  strategy  of  an  EU  in  the  present  discussion,  without  considering 
low-level  details  of  an  actual  implementation.   Hopefully,  our  global 
description  will  suffice  to  estimate  precisely  enough  the  performance  and 
complexity  features  of  an  EU,  and,  consequently  of  the  E-method. 

It  seems  preferable,  from  the  implementation  point  of  view,  to 
restate  the  basic  recursion  formula  (2.50)  as  follows: 


^9 


w"'  =  rdr""*'  +  £  a   ^J^)  (3.1) 

k=l 


where  a. .  =  -1,  so  that  the  need  for  explicitly  calculated  residuals  z 
n  a. 

is  avoided. 

As  indicated  by  Figure  3.1>  where  a  global  structure  of  the 
elementary  unit  EU.  is  shown,  the  evaluation  of  w.   basically  requires 
an  s-operand  adder.   In  many  practical  applications  s  is  rather  small. 
In  order  for  the  time  of  addition  to  be  independent  of  operand  precision, 
w.   need  to  be  represented  in  a  redundant  form. 

The  selection  procedure,  defined  by  the  selection  function 
s(w.   )  is  performed  by  the  block  S,  which  forms  w.   by  converting  a 

few  of  the  most  significant  digits  of  w.   into  nonredundant  form.  After 

~(i)  (1) 

rounding  w.   ,  the  integer  part  represents  the  selected  digit  d.   .  The 

precision  of  w.   is  basically  determined  by  the  overlap  A  and  the  number 
s  of  suramands.   All  functions  of  the  selection  block  S  can  be  easily 
implemented.   The  previous  value  d.     is  saved  in  register  D.   The 
coefficients  {a.,  }  may  be,  for  better  storage  efficiency  and  simpler  adder 
structure,  represented  in  a  nonredundant  form.   The  single,  signed  digit 
radix-r  multipliers  &}        are  incorporated  through  the  selection  networks 
{SN,  },  each  capable  of  forming  required  multiples  of  a.,  .   The  carry 
generator  C  would  be  needed  if,  for  example,  a  radix  complement  representa- 
tion of  negative  numbers  is  adopted.   In  that  case,  the  selection  networks 
merely  form  direct  or  complement  of  a  possibly  shifted  value  of  a.,  .   The 
complexity  of  the  selection  networks  increases  for  higher  radices,  and 
since  the  additional  multiples  appear  as  summands,  complexity  of  the  adder 
will  also  be  increased.   Therefore,  a  higher  radix,  while  reducing  the 
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necessary  number  of  steps  for  a  given  precision,  does  increase  both  the 
time  to  perform  the  basic  recursion  and  the  complexity  of  the  corresponding 
elementary  unit.   The  problem  of  finding  an  optimal  radix  under  given 
application  and  implementation  conditions  will  not  be  considered  here. 

The  central  part  of  an  EU,  the  multioperand  adder,  can  be 
implemented  in  various  ways  in  order  to  achieve  the  desired  speed/cost 
factor.  The  registers  R,  store  the  corresponding  coefficients  a., 
throughout  a  particular  evaluation;  their  number  for  the  elementary  unit 
EU.  is  determined  by  the  number  of  nonzero  elements  in  the  i-th  row  of 

the  matrix  G(x),  i.e.,  the  number  of  inputs  to  EU. .   Register  R  and  the 

- ■  1  w 

corresponding  data  path  must  accommodate  the  redundantly  represented  w.   . 

The  initialization,  i.e.,  the  execution  of  the  particular  correspondence 

rule  C  is  performed  by  loading  the  coefficients  {a.  }  and  b.  via  the 

initial  value  entry  bus. 

The  control  requirements  of  an  EU  are  very  simple:  assuming  a 

synchronous  mode  of  operation  of  the  entire  configuration  of  the  elementary 

units,  synchronizing,  clock  pulses  on  which  the  transfer  of  w.   into  R 

x        w 

occurs,  are  all  that  is  needed.  The  same  clock  pulses,  defining  the 
basic  step,  are  distributed  to  all  units.  By  controlling  their  number, 
one  can  easily  achieve  a  variable  precision  mode  of  operation,  as  discussed 
later. 

The  time  required  to  perform  one  recursive  step  on  an  elementary 
unit  is  defined  as: 


52 

where  t.  is  the  time  to  generate  w.   in  a  redundant  form,  t„  is  the 

selection  time  and  t  is  the  register  transfer  time.   Both  t  and  t 

correspond  to  a  few  gate  delays-- (3-^-)  t  ,  so  that  t  appears  as  the 

dominant  factor,  which  depends  on  the  number  s  of  summands  and  the  adder 

structure.   For  practical  reasons,  s  may  he  defined  to  denote  the  number 

of  radix-2  summands,  i.e.,  the  higher  radix  or  redundantly  represented 

operands  are  replaced  by  their  binary  equivalents.   Then  a  simple  adder 

structure,  consisting  of  s-2  levels  of  full-adder  rows,  will  have  a 

t.  =  2 (s-2)  t  ,  assuming  2t  per  full-adder.  More  sophisticated  adder 

structures,  i.e.,  Dadda-type  [H073]  can  considerably  reduce  this  time. 

However,  when  s  is  small,  like  in  polynomial  evaluation,  it  can  be  seen 

that,  for  radix  2,  t  =  0(lOt  ). 

0       g' 

3.3  A  Graph  Representation  of  a  General  Computing  Configuration 

An  L-reducible  problem  f  of  order  n  can  be  solved  by  the  E-method 
on  a  structure  consisting  of  n  interconnected  elementary  units.   The 
computational  algorithm  E  indicates  that  the  intercommunication  requirements 
are  simple  due  to  the  fact  that  the  elementary  units  are  functionally 
related  to  each  other  only  via  the  digit  vector  d.   This  implies  that  the 
physical  connection  between  the  units  EU.  and  EU .  only  needs  to 
accommodate  a  transfer  of  one,  signed  radix  r  digit.   In  common 
multiprocessor  structures,  used  for  fast  parallel  computations,  the 
processor  intercommunications  usually  require  full  precision  width. 

A  computing  structure  for  solving  a  given  problem  f  by  the  E-method 
may  be  conveniently  specified  by  a  computational  graph 

Gf(V,K)  (3.2) 
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where  V  =  {V. |i=l,  ...,n)  is  a  set  of  vertices  and  K  is  a  connection  matrix, 

defining  a  set  of  directed  arcs.  Each  vertex  V.  corresponds  to  an 

elementary  unit  EU.,  symbolically  represented  as  in  Figure  3.2  where  the 

outgoing  arc  d.  carries  the  digit  generated  by  EU.  and  one  or  more 

incoming  arcs  d.,  inputs  to  EU.,  carry  the  digits  generated  by  {EU.).   The 
J  ■*■  J 

connection  matrix  K  =  (k. . )  ^   is  in  one-one  correspondence  with  the 
-     lj'nXn 

matrix  G(x)  =  I  -  A(x)  of  the  system  L,  as  follows: 

k.  .  =  <  (3.3) 

L  0    if  g. .  =  0 

and  k.  .  =  1  specifies  that  the  arc  d.  is  an  incoming  arc  for  the  vertex 

V.,  i.e.,  the  connection  matrix  K  defines  the  relation  "receives  from" 
x  — 

between  the  elementary  units  {EU.}  of  the  computing  structure.  Note  that 

the  utilization  of  d.  for  internal  functions  in  EU. ,  as  indicated  in 

i  i 

Figure  3.1>  need  not  be  specified  on  the  graph  unless  g. .  ^  0. 

No  attempt  is  presently  being  made  to  treat  the  properties  and 
applications  of  the  above  defined  computational  graphs  in  a  rigorous 
and  extensive  way.  Rather,  some  basic  observations  are  made  and  several 
examples  are  given  to  illustrate  usefulness  of  computational  graphs  in 
the  E-method. 

Strictly  speaking,  there  is  only  one  functional  primitive,  the 
elementary  unit,  which  appears  in  any  application  of  the  E-method.   Yet 
for  convenience,  four  types  of  nodes,  obtainable  by  obvious  modifications 
of  the  elementary  unit,  are  defined  as  primitives  in  a  sense  that  no  G  , 
where  f  is  an  L-reducible  problem,  exists  which  cannot  be  constructed 
from  these  four  primitives. 


Figure  3.2  Elementary  Unit:  Graph 
Representation 
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The  conversion  primitive  G  ,  shown  in  Figure  3.3  (a),  satisfies 
the  following  output  function: 

Z  dp}  r"d  =  c.0  (3.U) 

where  c.^  is  a  constant,  possibly  in  a  nonredundant  form.   If  d! .   e£>,  as 
iO  1 

is  the  case  in  the  E-method,  the  Gp  primitive  performs  a  serial  conversion 
of  c   into  the  corr responding  redundant  form.   For  completeness,  one  could 
define  a  primitive  without  an  outgoing  arc  to  perform  serial  conversion 
into  a  nonredundant  form.   However,  such  a,  primitive  would  require  a 
facility  for  full-precision  carry  propagation,  unlike  the  other  primitives. 
For  that  reason,  we  prefer  to  assume  that  the  conversion  to  a  conventional, 
nonredundant  representation,  when  necessary,  would  be  done  in  a  separate 
addition  step,  on  a  separate  unit. 

The  multiply  primitive  GL.,  Figure  3.3  (b),  satisfies  the  input-output 
function : 

L  dp'>  r-3  =  o±0  +  e        L  «M  r"3  (3.5) 

0=1  0=1 

and  it  is  a  degenerate  case  of  the  inner  product  primitive  G  ,  Figure  3.3  (c), 
which  satisfies 

00       ,    .  v       .  (X)      00  i        > 

.\di  r'3  =  ciof  .=  =  ci^  r"°'        (3-6) 

0=1  0=1  k=i 

Mi 

The  divide  primitive  G  ,    Figure  3.3    (d),    satisfies 

=     dp>r-J   -^-  (3.7) 

0=1  il 
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0 


(a) 


O-* 


(b) 


(c) 


(d) 


Figure  3.3  Computational  Primitives 
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In  all  primitives,  c   are  the  coefficients  assumed  to  satisfy  the 

conditions  of  Theorem  2.3.   For  example,  the  computing  structure  for  the 

evaluation  of  a  polynomial  has  a  graph  G^  as  in  Figure  3.^  (a),  the 

evaluation  of  a  rational  function  has  a  graph  G  ,  Figure  3.*+  (b ),  while  the 

K 

evaluation  of  an  expression  like 


(ft)  (3-8) 


k=l   x   k 
requires  a  structure  with  a  graph  G  ,  Figure  3.^-  (c). 

The  computational  graph  G  may  be  acyclic  or  cyclic,  as  previous 
examples  show.   The  graph  G  is  acyclic  if  and  only  if  its  connection 
matrix  K  is  strictly  upper  (lower)  triangular.   In  a  practical  sense,  a 
cyclic  graph  G  corresponds  to  a  problem  f  which  involves  division. 

Importantly,  a  graph  G  provides  a  direct  estimate  of  the  required 

time  and  implementation  complexity.   If  N(K)  denotes  the  number  of  ones 

in  the  connection  matrix  K     then  the  computational  structure  requires 

at  most  N(K)  +  2n  m-digit  registers,  n  adders  with  total  of  N(K)  +  2n 

operands  and  N(K)  single  digit  interconnections.   It  is  assumed  that  two 

registers  are  sufficient  to  store  w.  value  in  a  redundant  form.   If 

l 

N.  =  | {cL  } |  denotes  the  number  of  inputs  to  the  i-th  elementary  unit, 
then  EU.  requires  s  =  (N.+2)  operand  adder  and  that  many  registers. 

The  time  t_  required  to  perform  one  recursive  step  is  clearly 
determined  by  the  parameters  of  the  elementary  unit  EUV  such  that 
N  =  max  (N. ).   The  total  time,  then,  for  the  E-method  evaluation, 

K      .      1 
1 

assuming  an  m-digit  precision,  is 

TE(f)  =  (m+1)  tQ  (3.9) 


gp: 


gr: 


<D 


(a) 


(b) 
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•  •  • 


Gf: 


Figure  3.k     Examples  of  Computational  Structures 
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if  no  scaling  is  required,  and 


T£(f)  =  (m+l+a)  tQ  (3.10) 

otherwise,  where  integer  a  >  0  is  defined  by  the  particular  scaling 
requirements. 

3.k     On  Modes  of  Operation  and  Implementation 

The  functional  properties  of  the  E-method,  namely  the  step- invariant 
nature  of  its  computational  algorithm  and  linearity  of  its  basic  operator, 
make  an  adaptation  both  to  a  variable  precision  mode  of  operation  and  a 
variable  number  of  elementary  units  rather  straightforward. 

A  variable  precision  mode  of  operation  is  very  desirable  from  the 
user's  point  of  view.   Since  the  computational  algorithm  of  the  E-method 
is  deterministic,  in  the  sense  that  no  tests  need  to  be  performed  to  check 
the  attained  accuracy  at  each  step,  the  desired  accuracy  of  the  result  is 
specified  directly  by  the  given  number  of  recursive  steps.   This  leads 
in  a  natural  way  to  a  variable  precision  operation.   The  second  aspect, 
i.e.,  the  ability  of  the  method  to  perform  without  severe  degradation 
while  using  the  limited  resources,  or  its  implementation  flexibility  in 
other  words,  is  also  of  practical  importance. 

Consider  the  application  of  the  E-method  in  a  given  problem  of 
order  n  and  precision  m.   Let  n  denote  the  number  of  available  elementary 
units  of  precision  m.  When  n  <  n  and  m  <  m,  clearly,  no  problem  exists 
in  achieving  a  variable  precision  mode  of  operation:  the  given  value  of  m, 
declared  by  the  user,  can  control  the  number  of  steps  to  be  executed  by 
the  elementary  units  and,  hence,  the  precision. 


6o 

If,  however,  m  >  m,  each  elementary  unit  should  be  augmented  with 
additional  storage  to  accommodate  the  extra  precision  of  the  operands. 
The  control  of  the  elementary  units  should  be  modified  so  as  to  allow 
multiple  precision  addition,  with  the  selection  process  carried  only  when 
the  most  significant  word  of  the  sum  has  been  generated.   As  shown  in 
Figure  3-5>  the  additional  storage  could  be  conveniently  implemented,  for 
example,  when  m  =  £m,    as  a  circular  array  of  £   parallel-in,  parallel-out 
registers,  connected  to  a  corresponding  operand  register  in  the  elementary 
unit  via  already  existing  bus  (Figure  3.1). 

The  same  solution  is  applicable  in  the  case  when  n  >  n,  i.e.,  when 
the  number  of  elementary  units  is  insufficient.   Now  each  array  row  of 
the  additional  storage  would  contain  the  data  corresponding  to  one  recursion. 
Also,  storage  for  the  digit  vector  d  must  be  provided  in  the  same  way. 
Again,  the  control  modifications  are  minor. 

Finally,  one  can  consider  the  case  where  the  elementary  unit  itself 
has  a  restriction  on  the  number  s  of  operands  it  can  handle  simultaneously. 
It  is  easily  seen  that  the  same  approach  as  above  will  suffice  to  accommodate 
the  necessary  number  s  of  operands.   Let 


k  = 
n 


k  = 
m 


k  = 
s 


n 

n 

m 

m 

s 
s 


(3.11) 


then  the  computation  time  is  affected  as  follows: 


T„(f)  *  k  k  k  (mfl)  tn  (3.12) 

E       n  m  s       0 
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Figure  3.5     Multiple   Precision  Scheme 
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where  n,  m,  s,  and  t  are  the  parameters  of  the  available  elementary  units 
and  n,  m,  and  s  are  parameters  of  the  given  problem. 

The  previous  discussion  reveals  the  flexibility  of  the  E-method:   it 
can  be  implemented  under  a  wide  range  of  speed/cost  constraints  in  a  simple 
way.   The  cost  change  in  precision,  in  the  number  of  elementary  units  or 
in  their  complexity,  affects  the  speed  of  computation  linearly.   In  addition, 
the  E-method  is  clearly  amenable  to  a  modular  implementation.   For  example, 
the  basic  module,  implementated  in  a  LSI  technique,  could  be  a  l6  bit  unit 
with  a  k- operand  adder,  k   registers  and  a  selection  and  carry  block  which 
can  be  by-passed  so  that  a  larger  precision  elementary  unit  could  be  simply 
constructed  by  concatenating  the  required  number  of  basic  modules.  An 
additional  module  could  be  a  8  x  l6  bits  circular  register  array.   By 
combining  these  two  types  of  modules  one  could  easily  achieve  a  desired 
computing  structure  to  support  the  E-method. 

It  is  of  certain  interest  to  discuss  the  modes  of  operation  of  a 
computing  structure  from  another  point  of  view.   Since  Algorithm  E  always 
generates  the  results  in  a  digit  fashion,  the  most  significant  digit 
generated  first,  an  "on-line"  mode  of  operation  seems  to  be  a  natural  way 
to  provide  for  a  faster  execution  of  sequences  of  interrelated  computations. 
In  other  words,  if  Algorithm  E  can  be  made  to  satisfy  the  "on-line" 
property  (cf.  p.  1*+)  with  respect  to  all  operands  appearing  in  the  basic 
recursion,  a  significant  overlap  and,  hence  the  speed-up  in  computing  can 
be  easily  achieved.   In  general,  the  "on-line"  mode  of  operation  implies 
that  the  j-th  digit  of  the  result  can  be  generated  before  the  (j+5)-th 
digits  of  the  operands  are  available.  Algorithm  E  already  satisfies  the 
"on-line"  property  with  respect  to  the  digit  vector  d  and  it  is  a  rather 
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straightforward  modification  of  the  basic  recursion  which  is  required  in 
order  to  extend  the  same  property  to  all  operands.  We  mention  also  that 
the  "on-line"  capability  of  the  E-method  can  be  particularly  important  in 
a  possible  real-time  application. 
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h.      ON  APPLICATIONS  OF  THE  E-METHOD 

k.l     Introduction 

In  this  chapter  several  L-reducible  problems  are  described  as  the 
representative  examples  of  some  applications  of  the  E-method.   The  selected 
applications  are,  we  believe,  important  and  illustrative  of  the  many  aspects 
of  the  proposed  method:   its  computational  power,  versatility,  simplicity 
of  implementation  as  well  as  certain  limitations,  resulting  mainly  from 
the  number  range  restrictions. 

We  begin  by  describing  the  evaluation  of  polynomial  and  rational 
polynomial  functions  of  a  single  variable.  As  will  be  seen,  any  polynomial  is 
L-reducible,  which  is  not  true  for  rational  functions.   However,  L-reducibilit; 
of  a  rational  function  can  be  assured  by  taking  into  account  the  appropriate 
conditions  when  the  coefficients  of  the  rational  function  are  being  defined. 
The  results  of  these  considerations  are  then  used,  to  provide  a  table  of 
some  elementary  functions  in  order  to  illustrate  the  time  and  complexity 
requirements  of  the  E-method  in  such  applications.  While  a  particular 
function  or  a  limited  set  of  functions  can  be  evaluated  more  efficiently 
in  some  other  techniques  [VOL59>  DEL70,  WAL71]  the  E-method  provides  a 
fast  and  a  reasonably  efficient  (in  a  sense  that  a  required  computing 
structure  in  a  particular  application  is  obtained  by  repetition  of  the 
same  simple  basic  part)  technique  of  evaluation  for  practically  an 
unlimited  number  of  functions.   The  compatibility  of  the  basic  arithmetics 
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with  the  proposed  method  is  also  considered  with  an  emphasis  on  division. 
By  looking  at  division  as  an  L-reducible  problem,  an  algorithm  was  derived 
which  has  a  desirable  way  of  generating  the  quotient  digits  deterministically 
and  which  can  also  use  redundantly  represented  partial  remainders.   This 
algorithm  appears  functionally  equivalent  to  the  division  algorithm 
described  by  Svoboda  [SVO63].   After  giving  some  examples  of  the  application 
of  the  E-method  in  evaluating  certain  arithmetic  expressions,  such  as 
multiple  products  or  sums,  inner  products  etc.,  this  chapter  is  concluded 
with  a  brief  consideration  of  the  E-method  as  a  linear  solver. 

We  have  made  no  attempts  presently  to  consider  the  implications 
the  E-method  may  have  in  fields  like  digital  signal  processing  or  linear 
control  systems.   It  may  be  expected  that  the  proposed  computational  technique 
can  be  efficiently  utilized  in  many  special  purpose  computing  systems  or 
devices. 

In  Chapter  3  it  was  indicated  that  the  basic  performance  factors, 
i.e.,  the  speed  and  the  cost  can  be  easily  deduced  from  the  computational 
graph  for  a  particular  application  of  the  E-method.  While  considering 
problems  of  evaluating  polynomials  and  rational  functions,  we  will  attempt 
to  provide  relative  measures  of  performance  with  respect  to  a  conventional 
sequential  or  S-method  and  a  parallel  or  P-method  of  computation.  We  will 
use  the  speedup  S,  efficiency  E  and  cost  C  factors  as  defined  by 
Kuck  [KUC7*+],  t>u"t  under  different  assumptions.   Namely,  as  a  matter  of 
hardware  implementation  complexity  standardization,  we  will  assume  that 
a  processor  used  by  S-,  or  P-method  is  equivalent  in  its  capability  and 
complexity  to  an  elementary  unit  of  the  E-method.  We  will  also  consider 
all  three  methods  in  a  domain  of  operations  on  a  hardware  level  so  that 
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the  "usual  assumption  that  each  arithmetic  operation  takes  the  same  unit 
time  does  not  hold  here.   The  relative  performance  measures  for  the  other 
application  examples  can  be  derived  in  a  similar  way  and  are  omitted  here. 

k.2     Evaluation  of  Polynomials 

Let 

u 
P  (x)  =  Z  p  x1  (k.l) 

^     i=0 

be  a  u-degree  polynomial  with  real-valued  coefficients  {p.}  and  the 

argument  x  e  [a,b].   Define  the  coefficient  and  the  argument  norms  as 

follows : 

HpJI  =  max  |pi|  (k.2) 

i 

||x||  =   max   |x|  (k.3) 

xe  [  a.,  b  ] 

assuming  that  p  and  x  are  vectors  with  p. 's  and  all  representable  values 

of  x  e  [a,b]  as  the  components,  respectively. 

If 

fell  <  5 

and  (h.k) 

||x||  <  a 

then  the  problem  of  evaluating  the  polynomial  P  (x)  is  immediately  reducible 
to  the  system  L:  A(x)  y_  -   b  of  order  n  =  u+1  according  to  the  following 
correspondence  rule  C  : 

r   1       for  i=J ; 
a.  .  =    /  -x       for  j=i+l,  i  <  u  (^.5) 

I  0       otherwise 
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p.      for  i=l,2, . .  ,,n+l 
b.  =    ^  (U.6) 

0       otherwise 

as  illustrated  in  Figure  U.l.   The  system  L  is  then  solved  using  the 

Algorithm  E  so  that  after  m+1  steps 

m+1   /  .  v 
y  =  E  d(jj  r"J  (U.7) 


satisfies 


and 


3=1 


P/x)  -  y*|  <  r"m  (4.8) 


1  \t  k-i+1    *i  .  -m  /1,  n^ 

'     pk  x     "  yi'   r  ^  9^ 

k=i-l 

for  i  =  2,3,  . .  .,1-1+1. 

The  computational  graph  G  is  shown  in  Figure  h.2..     All  the 
elementary  units  EU.,  i  =  1,2,... ,n  are  identical:   each  one  requires  an 
adder  with  one  redundant  (w.   )  and  one  nonredundant  (x«d.   )  operand, 
or,  effectively,  a  conventional  adder,  where  r=2. 

In  general  the  conditions  on  norms  (h.k)   may  not  be  satisfied  and 
an  appropriate  scaling  of  p.  's  and  x  must  "be  done  prior  to  the  evaluation. 
The  required  scaling  in  this  case  presents  no  problems. 

Let  S  =  (s. .)  .,   be  a  scaling  matrix,  defined  as  follows: 

ij  nXn  &      ' 

-(i-l)a 

f  r  for  i=j 

.„-  (U.10) 

I  0  otherwise 
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Figure  k.2     Computational  Graph  G, 
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where  a  is  a  positive  integer  or  zero  when  no  scaling  is  required.   Then 

-1  (i-l)aA 

its  inverse  S  '  is  also  a  diagonal  matrix  with  s. .  =  r       for  i=j. 

-'-J 

Consider 


S"1  A(x)  S  S"1  v_  =   S"1  b 


(^.11) 


Let 


-1 


A(x)  =  S"  A(x)  S 


i  =  s"1  Z 


(b.12) 


and 


b  =  S-1  b 


If 


-a 


A  <  a 


i.e., 


Got 


x 


r  a 


°A  = 


if  ||x||  ^  a 


otherwise 


(^.13) 


then 


||A(x)||  <0C 


as  desired.      However,    since 


Ml  <:  r    A||b||  , 


(h.lk) 


additional  scaling  must  be  performed,  as  described  in  Section  2.9,  if 
||b||  ^  £.   It  can  be  seen  that  this  right-hand  side  scaling  requires 

if  101  t  i 


T      P 


(U.15) 


v. 


otherwise 
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Note  that  the  scaling  of  the  matrix  A  does  not  introduce  change  in  the 
prescribed  number  of  steps  since  y,  =  y  ,  by  definition  of  the  scaling 
matrix  S.   However,  to  satisfy  £  bound,  a     extra  digits  may  be  necessary 
and  therefore  the  time  to  evaluate  a  polynomial  P  (x)  is,  in  general, 

TE(P^)  =  (m+l+ab)  tQ  (k.16) 

It  is  interesting  to  observe  that  if  no  scaling  is  required,  the  time  of 
evaluation  is  independent  of  the  degree  of  a  given  polynomial. 

After  a     and  a     are  determined,  the  scaling  is  performed  by 

I\  D 

shifting,  i.e., 

"°A 
a.  .  ,  =  -  x*r  for  i  =  1,2, ...,u. 

i,i+l  '  '    ' 


-(^-(1-1)0 ■) 

b.  =  p.   «r  for  i  =  l,2,...,u+l 

l    l-l  '  '   ' 


(J+.17) 


Example  k.l: 

Consider  the  scaling  requirements  for  the  polynomial  approximation 
fojp(x)  ~  P 'q(x),  x€[l/2,l]  with  a  precision  of  eight  decimal 
digits.   Using  the  coefficients  {p.}  from  [HAR68,  L0G2  2508,  p.  110, 
p.  225],  and  assuming  that  r  =  2,  m  =  27,  (;  =  3/k,  a.   =  1/8  we 
obtain : 

a  =  3     since   ||x||  <  1 

a     -  29    since   ||b||  <  3'227 

Therefore,  the  required  number  of  steps  to  evaluate  given 
polynomial  becomes 


m'  =  m  +  1  +  a,  =  57 
b 


D 


71 

Example  k.2: 

Evaluation  of  a  polynomial  as  an  approximation  to  2  ,   x€[0, 1], 
with  a  precision  of  7  decimal  digits  for  x  =  0.5  is  illustrated 
in  Figure  k.3.      The  coefficients   of  P,-(x)   are  from  [HAP68, 
EXFB  10te,    p.    102,    p.    20i+]  : 

P0  =  0.999999925 

px  =  0.693153073 

p2   =  0.2l|01536l7 

p  =  0. 558263130* 10"1 
p.  =  0. 89893^003*  10"2 
p^  =  0. 187757667* 10"2 

5 
The  parameters  are  r  =  2,   n  =  6,    £  =  3/k,   a  =  l/8,  a  =  3> 

Pi. 

ab  =  7,  m  =  2i+  +  1  +  7  =  32.   For  x  =  0.5, 

1  r     *i    -2k 
|/2  -  yj  <2  *\ 

a 

We  now  consider  the  performance  of  the  E-method  in  polynomial 
evaluation  relative  to  the  S-,  and  P-methods,  as  defined  in  Section  k.l. 
Both  these  methods  are  assumed  to  be  using  an  iterative  multiplication 
algorithm  so  that  the  basic  processing  unit  in  all  three  methods  can  be 
considered  equivalent  in  complexity  and  speed.   Furthermore,  we  assume  a 
fixed-point  representation  domain  with  no  scaling  requirements.   Denoting 
addition  time  as  t  ,  we  have  the  evaluation  times  and  the  number  of 
processors  for  E-,  S-,  and  P-method  as  follows: 
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7U 
TE(V  =  (m+l)  V  °E  =  ^  +  X 
TS(P^)  =  n  m  tQ,  ng  =  1  (U.18) 

Tp(P^)  «  (%2  n  +  (0%2  |i)l/2)m)  tQ,  np  -  2n 

We  have  assumed  that  the  P-method  uses  Maruyama-Munro- Pater son  algorithm  for 
parallel  polynomial  evaluation  as  given  in  [KtM7*+]. 

Following  Kuck  [KUC7^]>  the  E-method  algorithm  for  polynomial 
evaluation  has  the  speedup  factor  S  : 

E 

the  efficiency  factor  E^: 

E 

E 


and  the  cost  factor  C_: 

E 


CE  =  ryTE  =   (n+l)(m+l)  tQ  (U.21) 


Similarly  for  the  P-method  algorithm: 

T 
S  S  ._    u  m      _  _u_ 

P       TP       7m      /M+m      n/~M 

E     =^«^ £-  (11.22) 

nP       2nTm     ^M+m 

C     «  2\i    (M  +  nTm-hi)  t 

where  M  =  %pn.     As  a  reasonable  measure  of  performance,    Kuck  suggests  the 
ratio  of  effectiveness,   given  by  speedup,    and  cost  of  the  method  so  that 
the  A-method  is  better  in  performance  than  the  B-method  if 
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SA       Sb\  SA 


-  \t '  sl  *i  (U-23) 


It  can  be  easily  seen  that 

..      SE              m  1 

lim  77-   =  7z *  —r— 

|i-*»     e        (m+1)  tQ  0 

while  (h.2h) 

SP 
lim  -E  =  0 

SE  SP 

With  respect  to  the  precision,   both  lim  —  =  0  and   lim  —  =  0  but 

m-x»     e  m-x»     p 

S  C  2n%n 

lim^.gi-     =  "       >     1     f or     n  >  1  (U.25) 

m^oo     E  P 

This  indicates  that  the  E-method  algorithm  is  better  in  performance 
noticeably  than  any  P-method  algorithm  for  the  evaluation  of  polynomials 
under  the  previously  stated  conditions.   Furthermore,  the  E-method  algorithm 
has  a  behavior  of  performance  measure  qualitatively  different  from  the 
considered  P-method  algorithm,  as  illustrated  in  Table  k.l   and  Figure  k.k. 

In  this  example  it  is  assumed  that  m  =  56,  r  =  2  and,  for  presentation 

_2 
convenience,  that  tn  =  10   units. 

The  conclusion  that  the  E-method  offers  better  performance  in 

terms  of  speed  and  cost  than  a  P-method  in  polynomial  evaluation  could  be 

made  even  stronger  if  the  rather  complex  control  and  communication 

requirements  of  a  P-method  were  compared  with  those  of  the  E-method.  To 

emphasize  again,  these  conclusions  are  limited  by  the  imposed  assumptions, 

mentioned  earlier.   The  speed  of  a  P-method  algorithm  could  certainly  be 

increased,  by  using  sophisticated  multipliers  beyond  the  speed  of  the 


Table  4.1  Performance  Factors 
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n 

EE 

EP 

Vt0 

V0 

SE 

S 
P 

SE 

CE 

SP 
CP 

2 

0.65 

0.49 

57 

57 

1.96 

I.96 

1.14 

0.86 

4 

0.78 

0.35 

81 

3.92 

2.77 

1.38 

0.62 

8 

0.87 

0.28 

102 

7.84 

4.38 

1.53 

0.49 

16 

0.92 

0.24 

118 

15.68 

7.58 

1.61 

0.42 

32 

0.95 

0.21 

133 

31.36 

13.46 

1.67 

0.38 

77 


c 


1.5  ■- 


1   -- 


0.5  - 


Figure  k.k     Evaluation  of  Polynomials 
Performance  Measures 
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E-method,  but  at  the  expense  of  highly  increased  cost.   The  similar 
conclusions  about  the  relative  performance  of  the  E-method  with  respect 
to  the  P-method  of  polynomial  evaluation  with  combinational  arithmetic 
nets,  as  proposed  by  Tung  and  Avizienis  [TUF70],  were  found  to  apply. 


k.3     Evaluation  of  Rational  Functions 

The  correspondence  rule  C  which  defines  a  linear  system 

K 

L:  A(x)  y  =  b  for  a  given  L- reducible  rational  function  R   (x)  has 
—    *-   —  |i,  v 

already  been  given  in  Section  2.2.  Algorithm  E  can  be  carried  out  on  a 

configuration  represented  by  the  graph  G  with  n  =  max  (n,v)  +  1,  as 

R 

illustrated  in  Figure  U.5.   The  basic  recursion  (2.50)  becomes 


w(3)  =  d(j)  +  zti)  =  r(z(j-D  +  g.dp-1)  +  g.  .  .d.^:1)) , 

X        1        1         V  1         &ll  1         &1, 1+1  1+1   '     ' 


with  g±1  =   -  q.^  and  ^     =  x  , 


(k.26) 


i  =  2,  . .  ,,n-l, 

trivially  modified  for  i=l  (g.,=0)  and  for  i=n  (g     =0,  d  -,=0).   The 
J  Vtoil  '  Vton,n+1  '  n+1  ' 

adder  structure  of  the  elementary  unit  needs  to  accommodate  at  most  two 
nonredundant  and  one  redundant  operand  and  that,  for  radix  2,    can  be  easily 
achieved  with  a  two-level  conventional  adder. 

The  conditions  under  which  a  rational  function  is  L-reducible 
will  be  considered  in  some  detail.   Let 


P  (x) 

r    (x)  =   y- ,  s 


H 

z 

P- 

l 

X 

i=0 

i 

V 

i 

Z 

\ 

X 

i=0 
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be  a  rational  function  with  real-valued  coefficients  {p.}  and  (q.)  and  the 
argument  xe[a,b],  assuming  without  loss  of  generality  that  q  =  1.  Define 
the  following  norms : 

HpJI  =  max  |Pi  | 
i 

||oJ|  =  max  |q.  |  (i+.27) 

i/0 

||x||  =   max   |x| 
xe  [  a,  b  ] 

Then  by  the  conditions  of  Theorem  2.3,  the  following  must  hold: 

(i)   |[q+x||  <a 

(U.28) 

(ii)  h\\<i 

The  condition  (i )  appears  as 

fell  <  a  (^.29) 

for  the  first  row  of  A(x)  and  as 

|qj  <a  (U.  30) 

for  the  (v+l)-st  row  if  \x  <  v. 

The  scaling  implications  of  the  condition  (ii)  have  already  been 
discussed  in  Section  2.9  and  will  not  be  reconsidered  here.   The  condition 
(i),  for  practical  purposes,  can  be  expressed  as 

(i«)      ||qj|  <a(l-c),    0  <  c  <  1 

(U.31) 

(i")     ||x||<ca 

Supposing  that  (itf)  is  satisfied  and  utilizing  any  R   (x)  with 
q  =  1,  such  that 


81 


h±\  <^ra  >    ±  =  iA...,v  (4.32) 

is  L-reducible  and  can  be  evaluated  in  0(m)  steps  according  to  Algorithm  E. 

While  a  given  rational  function  may  or  may  not  satisfy  (ilf)  and 

(4.32),  these  conditions  can  certainly  be  taken  as  additional  constraints 

when  the  coefficients  {q.}  of  a  rational  function  are  being  determined. 

This  is,  in  particular,  applicable  when  the  coefficients  of  the  rational 

function  are  obtained  using  linear  programming  techniques.   It  may  be 

noted  that  certain  special  cases  may  arise  such  that  one  can  always  take 

advantage  of  their  presence.   For  example,  whenever  q.  =  0,  all  q.,  j  >  1, 

can  be  reduced  by  factor  c.   The  multiple  reductions  are  also  possible  for 

each  q.  if  q.  =0  for  more  than  one  i  <  j.   Similarly,  if  |q.  |  <  a,  the 
j     J-  1 

range  of  argument  x  can  be  increased.   If  |x|  >  a,    one  can  consider 
evaluating  R'   (l/x)  where,  hopefully,  p.1  =  p   . ,  q!  =  q^  .  for  v  >  u, 
satisfy  the  required  conditions,  etc. 

Example  2.3,  given  at  the  end  of  Chapter  2,  with  Figure  2.4, 
illustrates  the  evaluation  of  a  rational  function. 

Let  us  consider  the  performance  of  the  E-method  in  evaluating 
rational  functions  under  the  same  assumptions  as  in  the  previous  section. 
The  sequential  S-method  has 

assuming  that  two  digits  of  a  multiplier,  or  quotient  at  the  end  of  the 
evaluation,  can  be  retired  at  each  step.   Since 

VW  =  (m+1)  *0 
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the  speed-up  of  the   E-method  is 

s      _  JjS  _.  u+v+1 


E        TE  2 


with  an  efficiency 


E      -  —  ~  ^+V+1 

E        n  '"  2(max(u,v)+l) 

For  \i=v,    E     >  0.75.     We  know  of  no  P-method  algorithm  for  which 

E  — 

TP(Ru,v}  -TP(Pmax(u,v)) 
and 

E_(R   )  <  E_(P   t        s) 

so,  on  the  basis  of  the  conclusions  made  in  the  previous  section,  we  may 
again  conclude  that  the  E-method  is  faster  and  more  efficient  than  a 
P-method,  under  the  stated  assumptions. 

k.h     Evaluation  of  Elementary  Functions 

With  a  capability  to  efficiently  evaluate  polynomial  and  rational 
functions,  the  E-method  can  certainly  be  used  for  the  evaluation  of 
arbitrary  functions  for  which  suitable  polynomial  or  rational  approximations 
exist.   Hence,  the  evaluation  of  a  given  function  would  be  characterized 
mainly  by  the  corresponding  set  of  coefficients,  which  can  be  kept  in  a 
local  storage  area  of  the  computing  configuration,  or  on  any  other 
convenient  level  in  the  available  storage  hierarchy.   Furthermore,  the 
evaluation  could  be  performed  easily  in  a  variable  precision.   In  general, 
given  a  sufficient  number  of  the  two-input  elementary  units,  an  evaluation 


83 


would  take  TL(f )  =  O(lQm)  t  ,  where  t  is  a  gate  delay  and  the  radix  is 
Ev  g'        g 

binary.   Since  the  relationship  between  the  speed  and  cost  of  the  E-method 

is  linear,  a  wide  range  of  evaluation  requirements  can  be  easily 

accommodated. 

The  E-method,  as  described  before,  imposes  certain  conditions  on 

the  size  of  the  operands  which,  in  general,  prevent  the  use  of  an  arbitrary 

rational  approximation.   However,  one  can  define  desired  rational 

approximations  under  those  conditions  to  be  optimal  with  respect  to  the 

E-method.  We  have  not  attempted  here  to  derive  such  optimal  approximations. 

Rather,  we  have  decided  to  provide  an  overview  of  the  E-method  parameters 

for  several  previously  specified  approximations  of  certain  functions  [HAR68] 

so  that  the  performance  could  be  estimated.   Table  k.2   displays  these 

examples.   The  precision  of  the  approximations,  given  as  m  radix  2  digits, 

is  based  on  the  relative  error  criterion  except  for  the  last  four  examples. 

The  effects  of  scaling  appear  in  the  actual  number  of  steps  m'.   The 

variations  in  overlap  A,  defined  in  Section  2.k,    are  sometimes  necessary 

to  accommodate  given  coefficients  or  the  argument  range  but  have  no 

significant  effect  on  the  step  time  t».   The  required  number  of  elementary 

units  is  given  in  column  n.  A  computational  configuration  consisting  of 

elementary  units  does  not  preclude  the  existence  of  a  separate  fast 

multiplication  unit  so  that  in  place  of  R   (x),  a  more  efficient  form 

2 
R   (x  )  can  be  used.   Note  that  for  the  form  R(x)  =  x  ^  p'  ,  which 
2>2  Q(x  ) 

sometimes  appears,  the  correspondence  rule  C  can  be  easily  modified  as 

R 

follows : 


Table  4.2  Examples  of  Function  Evaluation  Requirements 
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Function 

Approxi- 
mation 

Index 

Argument 
Range 

n 

m 

m1 

A 

2x 

V(x) 

EXPB 

1103 

c°»^ 

6 

82 

83 

1 

35 

2X 

P8(x) 

EXFB 
1045 

[0,1] 

9 

1+0 

47 

1 
2 

iox 

P?(x) 

EXPD 
1404 

t°'& 

8 

47 

50 

1 
2 

X 

e 

R3)3(x) 

EXPEC 
1800 

C°'^ 

4 

41 

43 

1 
IS 

X 

e 

R5,5(X) 

EXPEC 
1802 

[°'^] 

6 

75 

77 

1 
IT 

sinh(x) 

VW 

SINH 
2002 

[0,  §] 

5 

45 

46 

1 
2 

H10£f) 

xPlt(x2) 

IDG  10 

2303 

[2/2-3, 

3-2/2] 

6 

41 

44 

1 
2 

hM> 

V3^ 

IOGE 
2663 

[2/2-3, 
3-2/2] 

5 

40 

43 

1 
2 

sin(^  x) 

R5>u(x) 

SIN 
29UI 

r*$ 

6 

43 

52 

1 
15 

sm(^  x) 

xPu(x2) 

SIN 
304l 

[0,1] 

6 

37 

43 

1 
2 

85 


Table  k.2     Examples  of  Function  Evaluation  Requirements  (continued) 


Function 

Approxi- 
mati  on 

Index 

Argument 
Range 

n 

m 

m' 

A 

cos(^  x) 

R3  2(x2) 

COS 
381+2 

[0,  g] 

7 

1+4 

1+7 

1 

IT 

tan(£g    x) 

xR^2(x2) 

TAN 
1+122 

[0,1] 

1+ 

1+2 

h5 

1 

arcsin(x) 

2 
xR1  2(x    ) 

ARCSN 
1+602 

[0,sin^] 

1+ 

39 

1+1 

1 

arctan(x) 

R3A(x) 

ARCTN 
5011 

[O^tan^-] 

5 

l+l 

^5 

1 
2 

r(x) 

Vx) 

GAMMA 
5208 

[0,1] 

10 

32 

1+8 

1 

2 

r(x) 

R6,3(x) 

GAMMA 
5231 

[0,|] 

7 

38 

1+0 

1 
IS 

j      ^r(^) 

P5(x) 

LGAM 
5I+OI 

[10~3,2~3] 

6 

37 

38 

1 
2 

^r(-) 

X 

i 

R3  2(x) 

LGAM 
5I+6O 

[io-3,^] 

1+ 

1+2 

43 

1 
5 

! 
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_0 


-x  0 


A(x2) 


r  0^ 


Z 


while  y  =  R(x),  as  "before. 

It  may  "be  of  interest  to  mention  here  that  Algorithm  E  can  be 
easily  adapted  for  the  evaluation  of  a  given  function  on  a  set  of  argument 
values,  which  is  of  practical  importance  in  many  numerical  problems.   Let 

us  consider  the  case  where  a  function  f(x)  is  to  be  evaluated  on  a  set  of 

-k 
equidistant  argument  values  3C  =  {x.  x.=x.  -,+Ax,  Ax  <  r  },  using  n 

&     3      3      J-1 

elementary  units.  Assume  further  that  the  set  X  is  such  that  the  first 
I  <  k  digits  of  the  argument  remain  invariant.   Since  Algorithm  E  generates 

the  result  in  a  left-to-right,  digit-by-digit  fashion,  it  is  sufficient 

(o) 
to  provide  2n  additional  registers  to  preserve  w    for  continuation  as 

well  as  a  facility  for  updating  the  argument  by  Ax.   These  modifications 

are  compatible  with  those  required  for  a  variable  precision  mode  of 

operation,  as  discussed  in  Section  3.^.   For  an  m  digit  precision,  the 

time  to  evaluate  f(x),  x  e  X,  ,  after  the  first  evaluation,  would  be 

T£(f)  -  (m-i)  tQ 

so  that  the  total  time  for  the  evaluation  over  )C  is  reduced  by  m/(m-i)  times 

U.5  On  Performing  the  Basic  Arithmetics 

The  generic  algorithm,  Section  2.6,  by  definition  of  the  recursive 
step  can  clearly  be  used  for  additions/ subtractions  and  multiplications. 
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The  results  obtained  will  be  in  a  redundant  form  so  that  the  conversion  to 
a  conventional  representation,  when  necessary,  would  require  a  carry 
propagation  facility.   By  considering  division  as  an  L-reducible  problem, 
the  quotient  and  the  remainder  can  be  generated  by  using  the  general 
algorithm  of  the  E-method  in  a  deterministic  fashion,  with  basic  recursive 
step  executable  in  time  independent  of  the  length  of  the  operands. 
Consider  the  division  problem 

B  =  AQ  +  R 
m    m 

where 

A  is  the  divisor,  1  -  a  <  |a|  <  1  +  CC;  (^.33) 

B  is  the  dividend,  |b|  <  |a|  <  {; ; 

0  is  m  digit  quotient  and 

R  is  the  corresponding  remainder. 
Consider  a  degenerate  system  L:  A  y_  =  b  of  order  1,  defined  by  the  division 
correspondence  rule  C  : 

b  =  b  =  (signA)B 
Let  g  -   1  -  a  so  that 

y  =  b  +  gy  (^.3*0 

where 

B 
y  =  A 
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m 


can  be  generated  using  Algorithm  E.   It  is  easily  shown  that  y 

,   (m+1)   ,  .  _ 
and  wv     satisfy 

IQ.-/I  <r"m 


3=1 


and 


m 


_     (m+1)  -m-1 
R  =  w      r 
m 


(*.35) 


The  conditions  (U.33)  require,  in  general,  an  initial  transformation 
of  the  given  divisor  A'  and  the  dividend  B'.   For  example,  assuming 
r  =  2,  A  =   0  and  1/2  <  |A'|  <  1  as  usual,  the  simplest  transformation  of 


the  divisor  could  possibly  be 

r 

2  |A'| 


A 


3 


[§, 


8 


=     (      2    lA'l        f0r  lA'l6    \ 


A' 


r5    3>, 
[8  '  V 


[ft-  1) 


(^.36) 


This  transformation  would  become  slightly  more  complicated  when  the 

the  transformation  could 
j-o-      3C 

be  carried  according  to: 


1       7 
selection  overlap  A  f   0.   For  A  =  -=rr,  oc   =  — 


r 


2  |  A'  | 

N  -  (     |  |A« 


r 


for  | A ' | e    ( 


rl    12) 

L2  '   32' 


rl9  21) 

L32  '  32 ; 

T^  1) 

L32  '  1; 


(^.37) 


which  is  not  a  serious  complication  in  exchange  for  a  carry-propagation 
free  division  algorithm,  indeed. 
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As  shown  in  Figure  k.6,    both  division  and  multiplication  require 
one  single  input  elementary  unit.   In  the  case  of  division  there  is  a 
cyclic  graph  representation. 

Example  k.3: 

Compute  the  quotient  Q  and  the  remainder  R  using  Algorithm  E, 
with  r  =  2,  m  =  5>  the  dividend  B  =  3/*+  and  the  divisor 
A  =  5/k   (g=-l/lf) 

j  w(j)  d(j)  ZU) 

1  1.1  1  o.i 

2  0.1  1  -0.1 

3  -1.1  1  -0.1 
i4                                       -0.1                                           1  0.1 

5  l.l  l  o.l 

6  o.l 


(d)  o-j 


Q     =     S     dvj;  2~J    =  0.11111-0.10011    (B/A  =  3/5   =  0.1001) 
5        d=l 

R     =  w*    '  2        =  0.0000001  so  that 
5 

B   =   QJl  +  Rc 

a 

U.6  Evaluation  of  Certain  Arithmetic  Expressions 

A  set  of  elementary  units  can  be  conveniently  applied  in  evaluating 
certain  expressions  in  0(m)  recursive  steps  according  to  the  algorithm  of 
the  E-method.  We  consider  first  two  basic  nontrivial  arithmetic  expressions 

P 

p  =  n  c  (^.38) 

c    .  ,   1 
i=l 
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03 


o 

•rl 

■s 

O 

ft 
•H 
-P 

3 


-d 


O 


•H 
P 

O 
<M 

W 
& 

s 

o 

•H 

I 

o 


-3- 


o 
O 
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and 


s 

Z 

i=l 


0*.39) 


Being  a  degenerate  case  of  a  polynomial,  the  multiple  product  P  can 
be  easily  evaluated  in  (m+l)  steps  with  p  elementary  units  according  to  the 
following  correspondence  rule : 


a.  . 


b. 

1 


for  i=j; 

for  j=i+l,  i=l, ...,p-l 

otherwise 

for  i=p 
otherwise 


{k.kO) 


Clearly, 


P 

|  n  c 
k=i  i 


yi 


<  r 


-m 


i=l,2, ...,p 


so  that  for  c.  =  x,  V  i,  the  E-method  generates  the  positive  integral 


powers 


of  x  :  x  ,x  ,x  ,  ...,3£*  in  (m+l)  steps  on  p  elementary  units.  As 


before,  it  is  assumed  that  factors  c.  satisfy  the  range  conditions. 

The  multiple  sum  S  can  be  evaluated  by  the  E-method  as  illustrated 
by  the  following  example. 


Example  k.k: 


To  evaluate  S     =     Z     c.,    consider  the  following  L  system: 
C       i=l     x 


1        -  cc/2         -  a/2 
1 

1  -  a/2        -  a/2 

1 

l 


~yr 

"bi" 

y2 

\ 

X 

y3 

= 

b3 

yk 

\ 

-y5- 

b5 
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k  ft 

For  a  =  1/8,  y  =  b  +  2~  (b  +b  )  +  2~  (b.  +b  ),  which  indicates 

the  necessary  relations  between  b. *s  and  given  c. ' s.   If  necessary,  scaling 

on  b  vector  can  be  performed  as  before.  □ 

Although  fully  consistent  with  the  E-method,  this  approach  is  not 

a  very  efficient  way  of  forming  sums,  since  a  k-input  elementary  unit  alone 

has,  by  definition,  a  capability  to  add  (k+l)  operands,  and  it  would  be 

desirable  to  exploit  this  more  efficiently.   Let  us  briefly  consider  the 

alternate  ways  of  evaluating  multiple  operand  sums.   For  the  sake  of 

simplicity,  it  will  be  assumed  that  the  available  elementary  units  have  k 

or  more  inputs,  so  that  (k+l)  operands  can  be  reduced  to  a  redundantly 

represented  sum  in  one  step,  i.e.,  in  time  t  .   Let  there  be  at  least 

n  =  (k  -l)/(k-l)  available  elementary -units.   Then  the  elementary  units  can 

be  arranged  as  an  i-level,  radix-k  tree,  illustrated  as  the  graph  G  in 

o 

Figure  ^-.7.   If  the  links  between  the  nodes  of  G  were  capable  of  carrying 

o 

in  parallel  the  full  precision  results,  then  s  =  (k+l)n  operands  could  be 
added  together  in  £   steps,  assuming  that  no  additional  operands  are 
allowed  to  enter  computing  scheme  once  the  evaluation  has  started. 

However,  if  only  single  digit  interconnections  between  elementary 
units  are  available,  as  we  assume  to  be  the  case  for  the  E-method,  then 
the  same  number  of  s  operands  would  require  (i+m-fl)  steps.  With  the 
displacements  of  operands,  relative  to  a  level  of  the  tree  (Figure  h.7), 
it  can  be  assured  that  the  most  significant  digit  of  the  sum  appears  on 
the  top  level  after  £   steps,  followed  by  one  more  digit  for  each 
additional  step. 

The  evaluation  of  the  inner  products  can  be  made  compatible  to  the 
E-method  as  follows.   Let 
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<ss: 


Figure  k.7     Computational  Graph  for  Multi-Operand 
Summation 


91+ 


U  =    (u^Ug,  ...,u    ) 


and 


be  two  real-valued  vectors  with 


(^D 


(4.142) 


W  =  (u,  v)  =  2  u.  v. 
i=l 


as  their  inner  product.   Define  a  (p+l)  order  system  L  as  follows 


(^3) 


10 


-u. 


b.  = 

i 


L 


u 


for  i=.j 

for  i=l  and  j=2, ...,p+l 

otherwise 

for  i=l 

for  i=2, . . .  ,p+l 


Then 


since 


-1 


71   =  W 


1  .    u 


0  ' 


(k.kh) 


(hM) 


If  one  p-input  elementary  unit  is  available,  the  inner  product  W 
can  be  evaluated  in  (m+l)  steps  where  the  remaining  p  elementary  units 
can  be  shift  registers.,   However,  when  only  k  <  p  input  elementary  units 
are  available,  modifications  to  the  above  given  correspondence  rule  are 
required.   One  obvious  way  would  be  to  form  independently  [p/k]  partially 
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accumulated  inner  products  W.  and  then  perform  the  final  summation  using 
one  of  the  previously  described  approaches,  as  indicated  in  Figure  k.Q. 
The  other  possible  approach,  more  in  the  spirit  of  the  E-method,  is  analogous 
to  Example  h.k,   and  it  is  illustrated  by  the  next  example: 


Example  k.5 


Evaluate  W  =  £  u.  v.  assuming  that  available  elementary  units 

i=l  1  ± 
have  no  more  than  three  inputs.  A  corresponding  system  L  may 

appear  as  follows: 


-  u. 


-  Ug   -  a/3 


-  u. 


-  u, 


-  u 


-"1 

"yl 

y2 
y3 

0 

vi 

V2 

5 

X 

y5 

= 

0 

3  v 
a     3 

a  \ 

y6 

y7 

a     5 

_ 

_         — 

Then  y^  =  W.  Only  two  3-input  elementary  units  are  required;  the 
remaining  five  need  not  be  more  complex  than  shift  registers.      □ 
The  previous  example  can  be  readily  generalized  for  a  p-dimensional 
inner  product  evaluation  on  k- input  elementary  units.  The  scaling  may, 
however,  introduce  a  tolerable  number  of  extra  steps.   For  example,  let 

a, 


l1  —  k  '       '  i1  —  = 


For  p  >  k  it  is  easily  seen  that 
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Figure  k.Q     Computational  Graph  for  Inner 
Product  Evaluation 


so  that  an  extension  of 


Vi=&bAL 


k, 


V" 
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"-  fW^ra-l 

may  be  required.   For  a  =  l/8,  k  =  8,  r  =  2,  p  =  6*f,  a  =  1+8  while  for  k  =  k, 
a  -   80,  still  of  the  order  of  commonly  used  precision  for  u. '  s  and  v., 's. 

We  conclude  this  section  with  a  remark  that,  in  general,  any 
arithmetic  expression  which  can  be  put  into  correspondence  with  a  system  L, 
can  be  evaluated  in  0(m)  steps.   For  example,  to  evaluate 

f  =  a(cd  +  be) 

the  following  system  L  is  solved 

L    -a 

1    -b 
1 


-c 


where  y  =  f.   Or,  to  compute 


a(f+gc)  +  e(l+cd) 
1  +  ab  +  cd 


one  may  solve 


r 


1  -a  0 
b  1  -c 
0     d    1 


y 


where  y  =  h. 
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k.J     On  Solving  the  Systems  of  Linear  Equations 

Some  general  implications  of  the  E-method  on  the  problem  of  solving 
the  systems  of  linear  equations  are  briefly  considered  here.   By  definition, 
the  proposed  Algorithm  E  is  a  linear  solver  which  can  be  applied  for 
solving  an  n-th  order  nonhomogeneous  system  of  linear  equations 

A  v_  =  b 

with  the  nonsingular  coefficient  matrix  A  in  0(m)  additive  steps  if 

||a  -  i||  <  a 

(hM) 

fell  <  i  ' 

and  each  of  n  elementary  units,  implementing  Algorithm  E,  has  at  least  (n-l) 

inputs,  in  general  (Figure  4.9  (a)). 

When  the  conditions  {k.k7)   are  not  satisfied,  a  scaling,  as  discussed 

in  Section  2.9,  must  be  considered.  Whether  scaling  can  be  achieved  in  a 

practical  way,  depends  also  on  the  type  of  the  coefficient  matrix  A.   It 

can  be  easily  seen,  for  example,  that  for  an  upper  (lower)  unit  triangular 

matrix  A,  scaling  procedure,  analogous  to  that  defined  for  polynomials, 

can  be  applied  in  a  practical  way.   Consider  a  unit  upper  triangular 

matrix  A  =  (a. . )  with  a. .  =  1  and  a. .  =  0  for  i  <  j.   Define  the  scaling 
-     ij        n         ij 

matrix  S  as  a  diagonal  matrix 

S  =  diag(l, r  ,r  ,...,r^     ) 

where  r  is  the  radix  and  o   a  positive  integer.   Then 

A  =  S  A  S"1 
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gl: 


(a) 


6 


TD.    I  1 


ozecec-^ 


(b) 


Figure  k.9     Computational  Graphs  for  Dense  and  Tridiagonal 
Systems  of  Linear  Equations 
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can  be  made  to  satisfy  the  required  conditions  by  choosing  an  appropriate 
value  for  a.   Consequently,  the  number  of  steps,  required  to  generate  the 
first  m  correct  digits  of  the  solution  y_  is  increased  by  no  more  than  (n-l)cr 
steps. 

For  practical  reasons,  the  number  of  inputs  to  the  elementary  unit 
may  be  expected  to  be  small.   The  E-method  remains  applicable  with  a  linear 
degradation  of  the  performance  as  considered  in  Section  3.^.  Under  input 
restrictions,  of  particular  interest  are  certainly  sparse  systems  of  linear 
equations.   For  example,  a  configuration  of  two-input  elementary  units, 
represented  by  the  graph  G__  in  Figure  h.9    (b )  can  be  used  to  efficiently 

solve  tridiagonal  systems  for  which  la.  .  I  <  Mia.  .  -,  I  +  la.  .  ,  |  ). 

J  '  n1  -   '  i,i-l'    '  i,i+l'  ' 

The  E-method  has  an  interesting  property  with  implications  on  the 
number  of  operations,  required  to  solve  a  linear  system  by  an  iterative 
scheme.   Consider  an  n-th  order  linear  system 

(I  -  G)  y_  =  b  (1+.J+8) 

solvable  by  the  E-method.   The  system  (k.kQ)   could  certainly  be  solved  by 
a  classical  Jacobi  algorithm,  with  the  basic  computational  step  defined  as 

yU)  =   b  +  GZU-1}   ,    j=l,2,...  (If.i+9) 

Recall  that  the  basic  computational  step  of  the  E-method  is 

i(j)  +  z(j)  =  r(z(;j_l)  +  Gd^),    j=l,2,...        (U.50) 

where  d  is  a  single  digit  vector.   Then,  the  recursion  (h.k-9)   requires  (n-l) 
full  precision  multiplications  and  (n-l)  additions  per  row  per  step,  while 
for  the  same  the  recursion  (^.50)  performs  (n-l)  single  digit  multiplications 
and  (n-l)  additions,  which  for  radix  2  becomes  (n-l)  additions. 
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The  number  of  operations  in  (k.ky)   can  be  considered  redundant  in 
the  sense  that  the  parts  of  the  generated  solution  are  repeatedly  utilized 
although  once  correctly  determined  and  used  the  digits  of  the  solution  v_ 
do  not  contribute  any  critical  information  in  the  subsequent  steps.   In 
other  words,  one  could  reduce  the  overall  complexity  of  an  algorithm  by 
conveniently  removing  the  correctly  computed  result  digits  from  further 
involvment  in  the  algorithm.   This  can  be  easily  achieved  by  a  left-to-right 
processing  and,  interestingly  enough,  by  introducing  the  redundancy  in  a 
number  system  representation,  as  is  being  done  in  the  E-method.   Since  the 
correctly  generated  digits  are  used  only  once,  i.e.,  digit  vector  d 
appears  in  the  recursion  only  at  the  step  (j+l),  we  may  think  of  Algorithm  E 
as  a  minimally  redundant  algorithm  for  solving  a  system  of  linear  equations 
with  respect  to  the  required  number  of  operations.   Thus,  the  redundancy 
removing  principles  of  the  E-method  may  have  an  application  even  in 
programming  iterative  linear  solvers  on  machines  with  a  slow  multiplication 
algorithm. 

However,  as  a  consequence  of  the  minimized  redundancy  on  the 
algorithmic  level,  the  convergence  of  the  algorithm  becomes  fixed, 
e.g.,  linear  for  Algorithm  E.   This  must  be  considered  when  evaluating 
the  merits  of  reducing  the  redundancy  in  number  of  operations  of  an 
iterative  scheme. 
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5.   CONCLUSIONS 

5.1  Summary  of  the  Results 

In  this  dissertation  a  recently  discovered  general  evaluation 
method,  amenable  to  an  efficient  hardware-level  implementation,  is  presented. 
The  proposed  evaluation  method,  referred  to  as  the  E-method,  is  characterized 
"by  several  important  performance  features  and  appears  applicable  in  many 
common  computational  problems. 

Briefly,  the  E-method  consists  of  i)  a  correspondence  rule  C  which 


reduces  a  given  computational  problem  feF  into  an  n-th  order  linear  system 

L,  F  being  the  class  of  problems  which  are  reducible  to  L,  and  ii)  an 
I 

algorithm  with  convenient  implementation  characteristics,  which  generates 
the  solution  to  the  system  L  and,  consequently,  to  the  original  problem  f, 
in  0(m)  recursive  steps,  where  m  is  the  desired  number  of  digits  of  the 
result  precision.   The  recursive  steps  are  invariant  and  each  of  them  is 
executable  in  time  independent  of  the  operand  length. 

The  class  FT  is  illustrated  in  a  sufficient  number  of  problems  to 

Li 

justify  the  claim  of  generality  for  the  E-method.   It  includes,  as  the 

most  practical  examples,  the  evaluation  of  polynomials  and  rational  functions 

so  that  all  functions,  for  which  corresponding  approximations  satisfying 

the  conditions  of  the  E-method  exist,  can  be  efficiently  evaluated.   The 

other  problems,  for  example  solving  certain  tri diagonal  (sparse,  in  general) 

systems  of  linear  equations,  or  the  evaluation  of  certain  types  of 

arithmetic  expressions  or  basic  arithmetics,  can  be  seen  to  belong  to  the 
class  FT . 


ICQ 

The  E-method  requires,  for  the  fastest  evaluation,  a  configuration 
of  n  identical  elementary  units,  but  allows,  in  a  straightforward  manner, 
exploitation  of  its  flexibility  in  tradeoffs  between  the  speed  and  cost. 
The  elementary  unit,  basically,  is  an  s-operand  adder  with  a  corresponding 
number  of  registers.   In  many  applications,  s  remains  very  small.  The 
interconnections  among  the  elementary  units  require  only  single  digit 
links  which  can  be  conveniently  specified  and  controlled  by  a  connection 
matrix  corresponding  to  the  particualar  coefficient  matrix  of  the  system  L. 
The  control  part  of  the  proposed  algorithm  is  simple  and  deterministic  in 
the  sense  that  there  are  no  convergence  tests  to  be  performed  even  though 
the  method  is  iterative.   The  result  is  always  generated  in  a  digit-by-digit 
fashion,  the  most  significant  digits  first,  by  applying  a  simple  invariant 
selection  procedure.   Thus  there  exists  a  possibility  for  an  overlapped 
mode  of  operation  in  a  sequence  of  dependent  computations  as  well  as  for  a 
variable  precision.  A  variable  precision  and  a  multi-point  evaluation 
facility  can  be  incorporated  in  a  straightforward  manner.   The  E-method  has 
favorable  error  properties:   it  is  never  ill-conditioned  and  no  round-off 
errors  are  generated,  by  definition.   In  its  present  form,  the  E-method  is 
restricted  to  fixed-point  or  block  floating-point  number  representation 
formats. 

In  applications,  like  polynomial  evaluation,  the  E-method  generates 
the  result  in  one  carry  propagation  free  multiplication  time,  unless 
excessive  scaling  is  involved,  requiring  an  application  dependent  number 
of  elementary  units.   For  elementary  function  evaluations,  k-T   identical 
2-input  elementary  units  would  suffice.   The  elementary  unit  has  a  simple 
and  easily  implementable  structure. 
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5.2  Comments 


While  the  proposed  method  provides  efficient,  technology-compatible 
solutions  to  many  computational  problems,  like  the  evaluation  of  elementary 
functions,  it  also  brings  together  the  following  issues  which,  we  believe, 
are  of  fundamental  importance  in  the  design  of  algorithms: 

i )  the  choice  of  algorithmic  representation  compatible 
with  the  implementation  environment; 
ii)  the  problem  of  redundancy  on  the  algorithmic  level; 
iii)  the  problem  of  redundancy  in  a  number  representation  system. 

The  first  issue  is  concerned  with  the  problem  of  minimizing  the 
number  of  algorithms  to  be  implemented  in  order  to  solve  a  set  of  different 
problems.  As  is  demonstrated  here,  the  replacement  of  a  given  set  of 
problems  by  a  unique,  isomorphic  problem  gives  rise  to  a  single  algorithm 
to  be  implemented.   The  algorithm  can,  hopefully,  satisfy  the  speed  and 
cost  objectives,  among  the  other  properties.  And  this,  in  general,  would 
imply  a  small  number  of  different  primitive  operators,  simple  enough  to  be 
efficiently  implementable.   In  the  E-method,  addition  appears  as  the  only 
required  primitive  operator.   The  corresponding  algorithmic  representation, 
considered  as  a  way  in  which  the  computations  are  to  be  performed,  has 
direct  implications  on  the  available  parallelism  and  hence,  the  achievable 
speed.   In  a  traditional  approach,  with  four  basic  arithmetics  as  the 
primitive,  indivisible  operators,  the  parallelism  is  exposed  or  introduced 
by  transformations  of  the  original  computation  sequence  so  that  the  time 
dependencies  between  the  required  operations  are  minimized.   The  E-method 
demonstrates  another  approach  in  parallelism  exposition:   a  systematic 
left-to-right,  digit-wise  processing  minimizes  the  necessary  delay  between 
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dependent  computations  and  can  achieve  a  parallelism  up  to  one  digit  delay, 
having  important  properties  like  a  simple,  deterministic  control,  etc. 
Furthermore,  this  approach  is  related  to  the  second  issue.  Namely,  it 
can  effectively  reduce  the  required  complexity  of  an  iteratively  defined 
algorithm,  by  reducing  the  number  of  necessary  operations.   It  also  affects 
the  choice  of  the  primitive  operators.   It  is  of  interest  to  remark  that 
in  many  parallel  algorithms  [KUC7^-],  it  is,  on  the  contrary,  necessary  to 
introduce  redundant  operations  in  order  to  achieve  parallelism. 

The  problem  of  redundancy  in  number  representation  systems  has  long 
been  recognized  as  a  central  issue  in  achieving  efficiently  fast 
algorithms  [ROB58,  AVl6l]  and  it  will  suffice  here  just  to  note  that 
the  E-method  is  another  example  where  the  redundancy  in  number  representation 
has  an  essential  role. 

The  theoretical  basis  of  the  E-method  is  simple  yet,  we  believe, 
extendable  to  other  interesting  applications.   Certainly,  Algorithm  E 
itself  can  be  considered  as  a  primitive  operator  which  can  be  utilized  in 
fast  parallel  schemes,  not  necessarily  of  the  type  defined  by  the  E-method. 

Among  the  problems  of  an  immediate  interest  would  be  the  development 
of  polynomial  and  rational  approximations  to  various  functions,  optimal 
with  respect  to  the  conditions  of  the  E-method.   It  may  be  of  some 
importance  to  note  that  the  E-method  in  these  applications  utilizes  the 
generated  solution  in  a  degenerate  sense  since  only  one,  usually  the  first, 
component  of  the  solution  is  of  practical  interest.   Thus,  it  may  be 
convenient  to  start  with  an  approximation  to  the  given  function  in  the  form 


f(x)  =  cp(y1(x),  y2(x),  ...,  yk(x)),  k  <  n 
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such  that  cp  is  a  computationally  simple  function.  This  approach  could 
possibly  provide  for  more  compatible  properties  of  the  coefficients  which 
are  affecting  the  scaling  at  an  acceptable  cost,  all  solution  components 
y.,  j  =  1, ...  n  being  generated  at  the  same  time.   The  problem  of  automatic 
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scaling,  i.e.,  an  extension  of  the  E-method  to  a  floating-point  number 
representation  domain,  a  more  systematic  account  of  possible  applications, 
a  detailed  design  and  implementation  study  of  an  elementary  unit  are  same 
other  subjects  of  practical  interest.   The  E-method  can  be  incorporated 
in  a  computing  system  in  two  obvious  ways:  as  an  autonomous  arithmetic 
processor  with  several  elementary  units,  or,  by  providing  the  processors 
in  a  multiprocessor  system  with  the  capabilities  of  an  elementary  unit. 
Finally,  it  would  be  of  interest  to  consider  the  changes  in  an  instruction 
set  which  could  make  the  E-method  efficient  for  use  on  a  software  level. 
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